Interval-based, nonparametric approach for resampling of fuzzy numbers

Romaniuk, Maciej; Hryniewicz, Olgierd

doi:10.1007/s00500-018-3251-5

Interval-based, nonparametric approach for resampling of fuzzy numbers

Methodologies and Application
Open access
Published: 24 May 2018

Volume 23, pages 5883–5903, (2019)
Cite this article

Download PDF

You have full access to this open access article

Soft Computing Aims and scope Submit manuscript

Interval-based, nonparametric approach for resampling of fuzzy numbers

Download PDF

1213 Accesses
9 Citations
Explore all metrics

Abstract

In this paper, we propose two new nonparametric resampling methods for the simulation of bootstrap-like samples of fuzzy numbers. The generated secondary samples are based on an input set (i.e., a primary sample) consisting of left–right fuzzy numbers. The proposed approaches utilize random simulations in a way which, to some extent, resembles a bootstrap. However, contrary to the classical bootstrap approach, the proposed methods are based on alpha-cuts of fuzzy numbers, which are generated in a new nonparametric way. Therefore, these procedures give us an opportunity to create ”not exactly the same as previous” fuzzy numbers and also lead to greater diversity of the obtained output. Moreover, we check whether the introduced methods can be successfully applied in two statistical tests about the mean value of a population of fuzzy numbers.

Bootstrapped Kolmogorov-Smirnov Test for Epistemic Fuzzy Data

Bootstrap Methods for Fuzzy Data

Bootstrap Comparison of Statistics for Testing the Homoscedasticity of Random Fuzzy Sets

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nowadays, computer-aided simulations constitute an important tool for solving practical problems in many areas, like, physics, mathematics, biology, chemistry. However, in order to simulate a model of a real process that has a random component, we need algorithms, which generate random variables in a specified way. Two important approaches should, in particular, be mentioned: the Monte Carlo methods and resampling algorithms. In the first case, a sample of iid (independent, identically distributed) random variables is used to solve the given problem, which is too complex for analytical evaluation (see, e.g., Robert and Casella (2004) for additional details and further discussion). In the second case, an initial sample of observed values is reused to create a secondary sample in order to, e.g., calculate the standard error of some complicated test statistics (see, e.g., Efron (1982) for an introduction and various examples). Resampling techniques, commonly known as bootstrap methods, are especially useful in inferential problems of mathematical statistics. They are used for estimation of probability distributions of sample statistics when analytical methods are either too complex or unavailable.

The same applies to the problems in which we have to deal with complex phenomena that are characterized not only by randomness, but also by fuzzy imprecision, as well. In such cases, for the description of the considered models we may use the concept of a fuzzy random variable that may be regarded as an imprecise (fuzzy) counterpart of a well known crisp (e.g., real) random variable.

Simulation methods for fuzzy random variables strongly depend upon their interpretation. There exist several mathematical models of fuzzy random variables. For their description, the reader is advised to read pertaining literature, such as, e.g., a very good monograph by Couso et al. (2014) or overview papers (Gil and Hryniewicz 2009; Gil et al. 2006a). The most popular interpretation of fuzzy random variables, known as “epistemic”, is based on the model proposed in the papers by Kwakernaak (1978, 1979). In this model, a fuzzy random variable describes imprecise (fuzzy) perception of an unobserved crisp random variable. The “epistemic” model of fuzzy random variables has been applied for solving many real-life problems using computer-aided simulations. Its applications were described in numerous papers. For example, the authors of this paper used it to solve problems from such different areas as: pricing of financial and insurance instruments (Nowak and Romaniuk 2013, 2017), estimation of the maintenance costs of a water distribution system (Romaniuk 2016, 2018) or Bayesian statistical decisions in reliability (Hryniewicz et al. 2015). The simulations which were considered in these papers, usually consisted in the generation of random hidden crisp origins, and respective membership functions (e.g., in the form of triangles with edges of random length). Successful applications of simulation methods in the case of “epistemic” fuzzy random variables can be explained using the results by Hryniewicz (2015), who noticed that fuzzy random variables, defined according to the definition by Kwakernaak, can be described in a fully probabilistic way using infinitely dimensional probability distributions.

The second popular definition of a fuzzy random variable was proposed in the seminal paper by Puri and Ralescu (1986). This definition is based on the notion of set-valued mapping and random sets, and its interpretation is called “ontic.” Simulation of “ontic” fuzzy random variables is much more difficult than in the case of “epistemic” ones. The main reason for this is the nonexistence of the concept of a probability distribution that describes fuzzy random “ontic” observations. We may define classical probability distribution only for certain sample characteristics, such as, the sample mean, but not for fuzzy observations. Moreover, for “ontic” random fuzzy variables popular measures of variability (such as variance) do not exist, and other measures should be used instead. All these interpretational problems make simulation processes of fuzzy random variables having “ontic” interpretation much more difficult. For example, Colubi et al. (2002) considered simulation methods for different types of both one- and multidimensional fuzzy variables in this setting. They used these methods for the analysis of asymptotic behavior of a fuzzy arithmetic mean, expressed in terms of the strong law of large numbers, and of the law of iterated logarithm. The process of simulation itself was thoroughly examined in the paper by González-Rodríguez et al. (2009). They proposed two different approaches, based on the concept of support functions. The first one is related to simulations of Hilbert space-valued random elements with a projection on the cone of all fuzzy sets. The second one imitates the representation of elements of a separable Hilbert space for an orthonormal basis directly on the space of fuzzy sets. Both of these approaches were compared, and their comparison showed that the second method is more adequate for modeling realistic situations.

The lack of a natural probability distribution of “ontic” fuzzy random variables makes resampling (bootstrap) methods a valuable alternative to Monte Carlo simulation. Indeed, bootstrap methods have been successfully used in statistical tests about the expected value of a fuzzy random variable (see, e.g., Gil et al. 2006b; González-Rodríguez et al. 2006; Montenegro et al. 2004), and in other types of statistical tests in fuzzy environment (see, e.g., Ramos-Guajardo et al. 2010; Ramos-Guajardo and Lubiano 2012). In the aforementioned papers, bootstrap samples enable the authors of the considered statistical tests to estimate a nominal significance level of a test via an empirical percentage of rejections of a true null hypothesis. In this approach, a bootstrap-based estimator serves as an empirical benchmark for the considered statistical test. Another bootstrap method, namely the weighted bootstrap, was used by Hung (2006) in the construction of the minimum inaccuracy fuzzy estimator, the calculation of its standard error, and the construction of appropriate confidence intervals.

A classical bootstrap approach has one disadvantage, appearing when the original fuzzy sample is small. In such a case, a bootstrap sample consists of few distinct values. This could be considered as an obstacle when this sample has to be used in modeling of complex phenomena, described, e.g., by complex functions of fuzzy random variables. Therefore, a modification of this method, with the aim of increasing the diversity of simulated results, seems to be needed. This is the main goal of this paper, in which we propose a new approach for the simulation of quasi-bootstrap populations of fuzzy random variables. Our approach is applicable to both “epistemic” and “ontic” fuzzy random variables.

Our idea to simulate more diverse fuzzy random populations is implemented in this paper in the form of two new nonparametric resampling methods for the so-called left–right fuzzy numbers (LFRNs). The proposed approach consists of two steps. In the first step, we use classical resampling methods in which a primary sample of fuzzy observations is reused by us in order to randomly generate the secondary sample. In the second step, we make some random perturbations of membership functions of fuzzy elements of the secondary sample. Therefore, our algorithms generate fuzzy numbers that may differ from the fuzzy numbers included in the original primary sample. Our new methods of resampling may be considered as a kind of bootstrap-like generation methods. The fuzzy numbers generated in the secondary sample may be thus “not exactly the same”, but rather “similar” (in some mathematical sense) to the values from the primary sample. Therefore, they have greater diversity, when compared to the secondary samples simply resampled from primary samples, but still possess the same main statistical characteristics.

In order to measure the aforementioned similarity and the diversity of a new secondary sample, we compare the generated sample and the input set using four types of measures of similarity and two types of triangular fuzzy numbers. The applicability of the strong law of large numbers and the law of iterated algorithm as indicators of convergence of the generated samples is also checked. Of course, LRFNs generated using proposed methods should have some practical value and should be applicable in solving the real-life problems. Therefore, we discuss an application of the introduced methods in two bootstrapped versions of statistical tests about the mean of a population of fuzzy numbers. An empirical p-value of these tests serves as a benchmark in the performed comparisons.

The paper is organized as follows. In Sect. 2, basic definitions of fuzzy sets and random fuzzy variables have been recalled. Moreover, we have presented the descriptions of statistical tests used for testing the hypotheses about the expected value. Next, in Sect. 3, we describe the proposed algorithms for the generation of bootstrap-like secondary samples. Then, in Sect. 4 we describe the results of the experimental verification of the properties of the proposed procedures. The application of the proposed new bootstrap procedures in statistical testing has been presented in Sect. 5. The paper is concluded in its last section.

2 Mathematical preliminaries

2.1 Fuzzy numbers and random fuzzy numbers

Let us present basic definitions and notation, concerning the simulation of fuzzy random variables, which will be used in this paper. Additional details can be found in, e.g., Gil and Hryniewicz (2009) and Gil et al. (2006a).

Definition 1

A fuzzy number ${\tilde{a}}$ is a fuzzy subset of ${\mathbb {R}}$ for which $\mu _{{\tilde{a}}}$ is a normal, upper-semicontinuous, fuzzy convex function with a compact support.

Then, ${\tilde{A}} (0)$ is the closure of the set $\{x:\mu _{{\tilde{A}}}\left( x\right) >0\}$.

A fuzzy number ${\tilde{a}}$ is a fuzzy subset of ${\mathbb {R}}$ for which $\mu _{{\tilde{a}}}$ is a normal, upper-semicontinuous, fuzzy convex function with a compact support. Then, for each $\alpha \in [0,1]$, the $\alpha $-level set ${\tilde{a}} (\alpha )$ is a closed interval of the form ${\tilde{a}} (\alpha ) =[a^L (\alpha ),a^R (\alpha )]$, where $a^L (\alpha ),a^R (\alpha ) \in {\mathbb {R}}$ and $a^L (\alpha ) \le a^R (\alpha )$.

A left–right fuzzy number (which is further abbreviated as LRFN) is a fuzzy number with the membership function of the form

$$\begin{aligned} \mu _{{\tilde{a}}}\left( x\right) = {\left\{ \begin{array}{ll} L \left( \frac{x-a}{b-a}\right) &{}\quad \text {if } x \in [a,b] \\ 1 &{}\quad \text {if } x \in [b,c] \\ R \left( \frac{d-x}{d-c}\right) &{}\quad \text {if } x \in [c,d] \\ 0 &{}\quad \text {otherwise} \end{array}\right. }, \end{aligned}$$

where $L, R: [0,1] \rightarrow [0,1]$ are non-decreasing functions such that $L(0) = R(0) =0$ and $L(1)=R(1) = 1$. Some examples of LRFNs are shown in Fig. 1.

A triangular fuzzy number ${\tilde{a}}$, denoted further by $\left[ a^L,a^C,a^R \right] $, is an LRFN with the membership function of the form

$$\begin{aligned} \mu _{{\tilde{a}}}\left( x\right) ={\left\{ \begin{array}{ll} \frac{x-a^L}{a^C-a^L} &{}\quad \text {if } x \in \left[ a^L,a^C \right] \\ \frac{a^R-x}{a^R-a^C} &{}\quad \text {if } x \in \left[ a^C,a^R \right] \\ 0 &{}\quad \text {otherwise} \end{array}\right. }, \end{aligned}$$

where $a^L$ is the left end of its support, $a^C$—its core, and $a^R$—the right end of its support. Some examples of triangular fuzzy numbers are shown in Fig. 2.

Fuzzy random variables are generalizations of ordinary (crisp) random variables. Historically, the first widely accepted definition of the fuzzy random variable was proposed by Kwakernaak (1978, 1979). Below, we present this definition in the version elaborated by Kruse and Meyer (1987).

Definition 2

(See Kruse and Meyer 1987) Let $\left( \Omega ,{\mathcal {A}},P\right) $ be a probability space, where $\Omega $ is the set of all possible outcomes of the random experiment, ${\mathcal {A}}$ is a $\sigma $-field of subsets of $\Omega $ (the set of all possible events of interest), and P is a probability measure associated with $\left( \Omega ,{\mathcal {A}}\right) $. A mapping ${\mathcal {X}}:\Omega \rightarrow {\mathcal {F}}_c({\mathbb {R}})$, where ${\mathcal {F}}_c({\mathbb {R}})$ is the space of all fuzzy numbers, is called a fuzzy random variable if it satisfies the following properties:

(i)
$\left\{ {\mathcal {X}}_{\alpha }(\omega ): \alpha \in [0,1] \right\} $, where ${\mathcal {X}}_{\alpha }(\omega )=\left( {\mathcal {X}}(\omega )\right) _{\alpha }$ is a set representation of ${\mathcal {X}}(\omega )$ for all $\omega \in \Omega $;
(ii)
for each $\alpha \in [0,1]$ both ${\mathcal {X}}_{\alpha }^{L}:\Omega \rightarrow {\mathbb {R}}$ and ${\mathcal {X}}_{\alpha }^{U}:\Omega \rightarrow {\mathbb {R}}$, with ${\mathcal {X}}_{\alpha }^{L}(\omega ) =\inf {\mathcal {X}}_{\alpha }(\omega )$ and ${\mathcal {X}}_{\alpha }^{U}(\omega ) =\sup {\mathcal {X}}_{\alpha }(\omega )$, are usual real-valued random variables associated with $(\Omega ,{\mathcal {A}},P).$

The fuzzy random variable defined according to Definition 2 has a seemingly natural interpretation. It may be considered as a fuzzy perception of an unknown true real-valued random variable associated with a random experiment and referred to as ‘the original’ of the considered fuzzy random variable. This interpretation is called “epistemic” and allows to process fuzzy data, interpreted as realizations of fuzzy “epistemic” random variables, in relatively easy way. For example, “epistemic” interpretation of fuzzy random variable allows us to simulate fuzzy random data in a relatively direct way, i.e., without making many additional assumptions.

Another, and from the mathematical point of view more general, definition was proposed by Puri and Ralescu (1986).

Definition 3

(See Puri and Ralescu 1986). Given a probability space $(\Omega ,{\mathcal {A}},P)$, a mapping ${\mathcal {X}}: \Omega \rightarrow {\mathcal {F}}_c({\mathbb {R}})$ is said to be a fuzzy random variable (also referred to as random fuzzy set) if for each $\alpha \in [0,1]$ the set-valued mapping $X_{\alpha }: \Omega \rightarrow {\mathcal {K}}_c({\mathbb {R}})$, where ${\mathcal {K}}_c({\mathbb {R}})$ is the class of the non-empty compact intervals and $X_{\alpha }(\omega )=(X(\omega ))_{\alpha }$ for all $\omega \in \Omega $, is a compact convex random set (that is, a Borel-measurable mapping with respect to the Borel $\sigma $-field generated by the topology associated with the Hausdorff metric on ${\mathcal {K}}_c({\mathbb {R}})$).

The fuzzy random variable defined by Definition 3 may be used for the analysis of random events (random data) that are intrinsically fuzzy. This interpretation is called “ontic” and may be used to process random data presented in the form of fuzzy random sets. Such fuzzy data exist in practice (see, e.g., the book by Viertl (2011) for examples), but their analysis is much more difficult. Because of these difficulties, there exist problems with the simulation of fuzzy random “ontic” data.

2.2 Measures of similarity

To compare some properties of two fuzzy numbers, like shape or location of their membership functions, one can use various measures of similarity. In this paper, we apply three classical measures: the supremum, the $l_1$ metric, the Hausdorff metric for fuzzy sets (see, e.g., Zwick et al. (1987) for additional details), and a more complex distance measure, which was introduced by Tran and Duckstein (2002). Then, all of these measures will be used to compare the LRFNs generated using the methods proposed in Sect. 3 with the fuzzy numbers taken from an initial (primary) sample.

If ${\tilde{a}}$ and ${\tilde{b}}$ are fuzzy sets, then the supremum similarity measure, introduced in Nowakowska (1977), is defined for their membership functions $\mu _{{\tilde{a}}} (x)$ and $\mu _{{\tilde{b}}} (x)$, as

$$\begin{aligned} m_{\infty } \left( {\tilde{a}}, {\tilde{b}} \right) = \sup _x \left| \mu _{{\tilde{a}}} (x) - \mu _{{\tilde{b}}} (x) \right| . \end{aligned}$$

In the case of the $l_1$ metric, proposed in Kaufman (1975), an appropriate measure is given by

$$\begin{aligned} m_{l_1} \left( {\tilde{a}}, {\tilde{b}} \right) = \int _{-\infty }^{\infty } \left| \mu _{{\tilde{a}}} (x) - \mu _{{\tilde{b}}} (x) \right| \mathrm{d}x. \end{aligned}$$

There are various ways to extend the Hausdorff distance to a metric for fuzzy sets (see, e.g., Zwick et al. 1987). In this paper, we will use this distance in the version proposed by Ralescu and Ralescu (1984), and defined as

$$\begin{aligned} m_H \left( {\tilde{a}}, {\tilde{b}} \right) {=} \int _{0}^{1}\!\!\max \left\{ \vert a^L (\alpha ) {-} b^L (\alpha )\vert , \vert a^R (\alpha ) - b^R (\alpha )\vert \right\} \mathrm{d} \alpha . \end{aligned}$$

The fourth distance measure considered in this paper was introduced by Tran and Duckstein (2002), and it is given by the following formula

$$\begin{aligned}&m_\mathrm{TD} \left( {\tilde{a}}, {\tilde{b}} \right) = \int _{0}^{1} \left( \left( \frac{a^L (\alpha ) + a^R (\alpha )}{2} - \frac{b^L (\alpha ) + b^R (\alpha )}{2} \right) ^2 \right. \nonumber \\&+\, \left. \frac{1}{3} \left( \left( \frac{a^L (\alpha ) - a^R (\alpha )}{2}\right) ^2 + \left( \frac{b^L (\alpha ) - b^R (\alpha )}{2}\right) ^2 \right) \right) \nonumber \\&\times \, w (\alpha ) \mathrm{d} \alpha \big / \int _{0}^{1} w (\alpha ) \mathrm{d} \alpha , \end{aligned}$$

(1)

where $w (\alpha )$ is a certain weighting function. In this paper, we assume that $w (\alpha ) = 1$, so each $\alpha $-cut in the measure (1) has the same significance (see Tran and Duckstein (2002) for other possible types of the weighting function and further discussion).

Of course, many other types of measures of similarity between two fuzzy sets have been proposed, and some of them are widely used. But, in the following, we will focus our attention only on the previously mentioned four measures of similarity. Three of them (namely $m_{\infty }, m_{l_1}, m_H$) represent very “classical” and “standard” approaches. They have strict relationships with mathematical measures of similarity, which are known and used for crisp values (i.e., real numbers), or with geometrical measures applied to points in space. Therefore, they are precisely and intuitively understood and easy to implement. Characteristics of many other measures are directly compared to the properties of these three ones. Obviously, these measures have some disadvantages, too. For example, the measure $m_{\infty }$ takes into account only the supremum of the difference between the membership functions of two fuzzy sets. Therefore, even two “very similar” (in a broad sense of this word) fuzzy sets can significantly differ, if this measure is applied.

We also consider the fourth measure, denoted by $m_\mathrm{TD}$. According to its proposers (Tran and Duckstein 2002), this measure is specifically tailored for the LRFNs, which are our main objects of interest in this paper. Moreover, in Tran and Duckstein (2002), the authors enumerated many advantages of this measure, such as straightforward computation, facility of interpretation for decision makers, robustness, flexibility, and transitivity (see also Zhu and Lee 1992). After some comparisons, Tran and Duckstein conclude in Tran and Duckstein (2002) that this measure is at least as reasonable as other existing ones. From our point of view, the application of $m_\mathrm{TD}$ in this paper seems to be a “one step further” (i.e., beyond “very classical” approaches) in comparison with the usage of the three previously mentioned measures.

In Sect. 4, we apply these four measures to the check of similarity and diversity of triangular fuzzy numbers, generated using the methods introduced in Sect. 3 and using the classical bootstrap approach. Of course, other types of measures can be also utilized for this purpose. However, quite surprisingly, the obtained results seem to behave in a very stable way without unexpected differences, regardless of the measure used. Therefore, it seems to us that application of other measures of similarity should lead to more or less the same overall results.

2.3 Tests of the fuzzy mean value

There are many types of statistical tests for an expected value of a fuzzy random variable (see, e.g., Gil et al. 2006b; González-Rodríguez et al. 2006; Körner 2000; Montenegro et al. 2004). We focus on only two of them, which will be used in Sect. 5 as examples of application of the introduced nonparametric simulation methods.

The first considered test is an asymptotic test introduced in Körner (2000). Let us assume that ${\tilde{a}}$ is an LRFN with a core, which is given by a single value. Then, we have

$$\begin{aligned} m_a = a^L (1) = a^R (1), l_a = m_a - a^L (0), r_a = a^R (0) - m_a. \end{aligned}$$

The $d_2$ distance between two LRFNs ${\tilde{a}}$ and ${\tilde{b}}$ is defined as

$$\begin{aligned} d_2^2 \left( {\tilde{a}},{\tilde{b}}\right)= & {} \vert m_a - m_b \vert ^2 + R_2 \left| r_a - r_b \right| ^2 + L_2 \left| l_a - l_b \right| ^2 \\&+\, 2 \left( m_a - m_b \right) \left( R_1 \left( r_a - r_b \right) - L_1 \left( l_a - l_b \right) \right) , \end{aligned}$$

where

$$\begin{aligned} L_2 = \frac{1}{2} \int _{0}^{1} \left| L^{(-1)} (\alpha ) \right| ^2 \mathrm{d} \alpha , L_1 = \frac{1}{2} \int _{0}^{1} L^{(-1)} (\alpha ) \mathrm{d} \alpha . \end{aligned}$$

The values of $R_1$ and $R_2$ are defined analogously (see Körner 2000).

For this type of distance, we have the following corollary, which was proved in Körner (2000):

Corollary 1

Let $X_1, X_2, \ldots , X_n$ be a sample of LRFNs. Then

$$\begin{aligned} n d_2^2 \left( {\bar{X}}, {{\mathrm{{\mathbb {E}}}}}X\right) \xrightarrow [n \rightarrow \infty ]{} \lambda _1 \xi _1^2 + \lambda _2 \xi _2^2 + \lambda _3 \xi _3^2, \end{aligned}$$

where $\xi _1, \xi _2, \xi _3$ are independent N(0, 1)-distributed random variables and $\lambda _1, \lambda _2, \lambda _3$ are the eigenvalues of the matrix

$$\begin{aligned} \begin{pmatrix} C_{m_X m_X} - L_1 C_{l_X m_X } + R_1 C_{r_X m_X} &{} L_2 C_{l_X m_X} - L_1 C_{m_X m_X} &{} R_1 C_{m_X m_X} + R_2 C_{r_X m_X} \\ C_{l_X m_X} - L_1 C_{l_X l_X} + R_1 C_{r_X l_X} &{} L_2 C_{l_X l_X} - L_1 C_{l_X m_X} &{} R_1 C_{l_X m_X} + R_2 C_{r_X l_X}\\ C_{r_X m_X} - L_1 C_{r_X l_X} + R_1 C_{r_X r_X} &{} L_2 C_{r_X l_X } - L_1 C_{r_X m_X } &{} R_1 C_{r_X m_X} + R_2 C_{r_X r_X} \end{pmatrix}, \end{aligned}$$

where $C_{zy} = {{\mathrm{{\mathbb {E}}}}}(z - {{\mathrm{{\mathbb {E}}}}}z) {{\mathrm{{\mathbb {E}}}}}(y - {{\mathrm{{\mathbb {E}}}}}y)$ for $z,y \in \{ m_X, l_X, r_X\} $. Moreover, an asymptotic test of the hypothesis

$$\begin{aligned} H_0 : {{\mathrm{{\mathbb {E}}}}}X = {\tilde{V}} \text { against } H_1 : {{\mathrm{{\mathbb {E}}}}}X \not = {\tilde{V}} \end{aligned}$$

is formulated as follows: reject $H_0$, if

$$\begin{aligned} nd_2^2 \left( {\bar{X}}, {\tilde{V}}\right) > \omega ^2_{1- p}, \end{aligned}$$

where $w^2_{q}$ is the q-th quantile of an $\omega ^2$ distribution with respect to the eigenvalues $\lambda _1, \lambda _2, \lambda _3$.

The above-mentioned $\omega ^2$ distribution has a rather complex structure, which is known only for some special cases (see Körner 2000).

The second considered test was developed in González-Rodríguez et al. (2006) and Montenegro et al. (2004). It is based on a metric introduced in Bertoluzza et al. (1995), which was generalized in Körner and Näther (2002). The $D^\varphi _W$ metric for two LRFNs ${\tilde{a}}, {\tilde{b}}$ is defined as

$$\begin{aligned} D^\varphi _W \left( {\tilde{a}}, {\tilde{b}} \right) = \sqrt{\int _{0}^{1} d^2_w \left( {\tilde{a}} (\alpha ), {\tilde{b}} (\alpha ) \right) \mathrm{d} \varphi (\alpha )}, \end{aligned}$$

(2)

where

$$\begin{aligned} d^2_w \left( {\tilde{a}} (\alpha ), {\tilde{b}} (\alpha ) \right) = \int _{0}^{1} \left( f_{{\tilde{a}}} \left( \alpha , \lambda \right) - f_{{\tilde{b}}} \left( \alpha , \lambda \right) \right) ^2 \mathrm{d} W (\lambda ) \end{aligned}$$

with $f_{{\tilde{a}}} \left( \alpha , \lambda \right) = \lambda a^R (\alpha ) - (1- \lambda ) a^L (\alpha ) $, and $W, \varphi $ are two weighting normalized measures (see Bertoluzza et al. (1995) for some examples of $W, \varphi $ and further details).

Then, we have the following corollary, which was established in González-Rodríguez et al. (2006):

Corollary 2

Let $X_1, X_2, \ldots , X_n$ be a sample of LRFNs. In testing, the null hypothesis

$$\begin{aligned} H_0 : {\mathbb {E}} X = {\tilde{V}} \end{aligned}$$

at the nominal significance level p, $H_0$ should be rejected, if

$$\begin{aligned} \frac{D^\varphi _W \left( {\bar{X}}, {\tilde{V}}\right) ^2}{{\hat{S}}^2} > z_{1-p}, \end{aligned}$$

where $z_{q} $ is the q-th empirical quantile of the bootstrap distribution, which is given by

$$\begin{aligned} \frac{D^\varphi _W \left( {\bar{X}}^*, {\bar{X}}\right) ^2}{{\hat{S}}_*^2} \end{aligned}$$

and with

$$\begin{aligned} {\bar{X}}^* = \frac{1}{n} \sum _{i=1}^{n} X_i^*, {\hat{S}}_*^2= \frac{1}{n-1} \sum _{i=1}^{n} D^\varphi _W \left( X_i^*, {\bar{X}}^* \right) ^2, \end{aligned}$$

where $X_1^*, X_2^*, \ldots , X_n^*$ is a bootstrap sample obtained from the initial sample $X_1, X_2, \ldots , X_n$.

We will use both these tests in the experimental analysis of the practical applicability of our simulation procedures.

3 Generation of the secondary (bootstrap) sample

Let ${\mathcal {A}} = \{ {\tilde{a}}_1, \ldots , {\tilde{a}}_m \}$ be a primary sample of LRFNs. These values are treated as an input set for the methods proposed further on in this paper. We assume that we do not have (and, moreover, we do not need) any additional information about a source (or a model) of the fuzzy numbers belonging to ${\mathcal {A}}$. Note, however, that in many cases, which are known from literature, such additional information is often assumed (see, e.g., Colubi et al. 2002; Hryniewicz 2015; Hryniewicz et al. 2015; Nowak and Romaniuk 2013, 2017; Romaniuk 2016 for various approaches to the problem of fuzzy numbers modeling). Therefore, only a strictly nonparametric way should be used to build a secondary (bootstrap) sample ${\mathcal {B}} = \{ {\tilde{b}}_1, \ldots , {\tilde{b}}_n \}$ of LRFNs, which should be, in some way, “similar” to the fuzzy numbers from ${\mathcal {A}}$.

Let ${\tilde{a}}_j (\alpha ) = \left[ a_j^L (\alpha ), a_j^R (\alpha ) \right] $ be an $\alpha $-cut of ${\tilde{a}}_j$ for some $\alpha \in [0,1]$. For simplicity, we assume that there are $k+1$ possible values of $\alpha $, so we have $\alpha \in \left\{ \alpha _0, \alpha _1, \ldots , \alpha _k \right\} $, where $\alpha _0 = 0< \alpha _1< \cdots < \alpha _k = 1$. We also assume that $a_j^L (1) = a_j^R (1) = a_j (1)$ for each ${\tilde{a}}_j$. However, this requirement can be easily relaxed in a simulation procedure presented further.

During the first step of an initialization procedure (a setup of simulation, see Algorithm 1), a set of cores ${\mathcal {C}} (1)$ is found, based on ${\mathcal {A}}$. Hence, we have

$$\begin{aligned} {\mathcal {C}} (1) = \left\{ a_1 (1), \ldots , a_m (1)\right\} . \end{aligned}$$

For simplicity of notation, we assume that the set ${\mathcal {C}}(1)$ is already ordered, i.e., $a_1 (1) \le a_2 (1) \le \cdots \le a_k (1)$.

During the second step of the initialization procedure, sets of incremental spreads for all possible $\alpha $-cuts are constructed. Let

$$\begin{aligned} s_j^L (\alpha _i) = a_j^L (\alpha _{i+1}) - a_j^L (\alpha _{i}) \end{aligned}$$

(3)

be the difference between left ends of $\alpha $-cuts for $\alpha _{i+1}$ and $\alpha _{i}$, for the given fuzzy number ${\tilde{a}}_j$. We call such a difference an incremental left spread for the level i. In the same manner, we have

$$\begin{aligned} s_j^R (\alpha _i) = a_j^R (\alpha _{i}) - a_j^R (\alpha _{i+1}), \end{aligned}$$

(4)

which is the difference between the right ends of $\alpha $-cuts for $\alpha _{i}$ and $\alpha _{i+1}$, for the given fuzzy number ${\tilde{a}}_j$. It will be called an incremental right spread for the level i. Then, the sets of left and right incremental spreads, given by

$$\begin{aligned} {{\mathcal {S}}^{L}} (\alpha _{i})= & {} \left\{ s_{1}^{L} (\alpha _{i}), \ldots , s_{m}^{L} (\alpha _{i}) \right\} , \nonumber \\ {{\mathcal {S}}^{R}} (\alpha _{i})= & {} \left\{ s_{1}^{R} (\alpha _{i}), \ldots , s_{m}^{R} (\alpha _{i}) \right\} \end{aligned}$$

(5)

for $\alpha _{k-1}, \alpha _{k-2}, \ldots , \alpha _{0}$ can be found. It should be noted that the construction of (5) has to be made from the highest value of $\alpha $ to the lowest one (i.e., from the core of a fuzzy number to its support). We also assume, in the same manner as for the set of cores ${\mathcal {C}} (1)$, that each of the sets (5) is already ordered, so that

$$\begin{aligned} 0 \le s_1^L (\alpha _i)\le \cdots \le s_m^L (\alpha _i), 0 \le s_1^R (\alpha _i) \le \cdots \le s_m^R (\alpha _i) \end{aligned}$$

for all $\alpha _i$.

Let us illustrate this initialization procedure with a numerical toy-example.

Example 1

Suppose that our primary sample consists of only three triangular fuzzy numbers [0, 1, 3], [1, 2.5, 5] and [1, 3.5, 5] (see Fig. 3). Because these numbers are strictly triangular, in the following we consider only two different $\alpha $-levels: $\alpha _1=1$ (cores) and $\alpha _0=0$ (supports). For these data the set of cores is ${\mathcal {C}} (1)=\left\{ 1, 2.5, 3.5\right\} $. The ordered sets of incremental left and right spreads are $ {\mathcal {S}}^L (\alpha _0)= \left\{ 1, 1.5, 2.5 \right\} $ and $ {\mathcal {S}}^R (\alpha _0)= \left\{ 1.5, 2, 2.5 \right\} $, respectively.

Now, the secondary sample ${\mathcal {B}}$, which consists of n fuzzy numbers, can be generated. In order to do this, we use one of two methods, based on two kinds of distributions.

3.1 The d-method based on a discrete distribution d(x)

Let us start from the description of a generation procedure in case of the d-method, based on a discrete probability distribution d(x). In the proposed procedure, two steps are necessary to construct a fuzzy number ${\tilde{b}}_j \in {\mathcal {B}}$, where $j=1, \ldots , n$ (see also Algorithm 2).

Firstly, the value of a core $b_j (1)$ is found, using a uniform discrete distribution for the values from the set ${\mathcal {C}}(1)$. It means that the generated value $b_j (1) = C$ is a random element, taken from the set ${\mathcal {C}}(1)$, according to the probability distribution d(x). In this paper, we assume that d(x) is uniform on ${\mathcal {C}}(1)$, i.e.,

$$\begin{aligned} \Pr \left( C= a_l (1)\right) = d(a_l (1)) = \frac{1}{m}, \end{aligned}$$

where $l= 1, \ldots , m$. Therefore, we randomly (and uniformly) pick up a single value from the set ${\mathcal {C}}(1)$ and treat it as the core of the new, constructed LRFN ${\tilde{b}}_j $.

Secondly, consecutive $\alpha $-cuts of the given ${\tilde{b}}_j$ are found, starting from its core and ending at its support. Thus, we proceed from ${\tilde{b}}_j (\alpha _{k-1})$ down to ${\tilde{b}}_j (0)$. For each $\alpha _i$, the value of the left end of the $\alpha $-cut of ${\tilde{b}}_j$ is found, using

$$\begin{aligned} b_j^L (\alpha _i) = b_j^L (\alpha _{i+1}) - S^L (\alpha _i), \end{aligned}$$

(6)

where $S^L (\alpha _i)$ is an independently drawn random value from the set ${\mathcal {S}}^L (\alpha _i)$. Once again, the uniform discrete distribution d(x) is used, for which

$$\begin{aligned} \Pr \left( S^L (\alpha _i)= s_l^L (\alpha _i)\right) = d (s_l^L (\alpha _i)) = \frac{1}{m}, \end{aligned}$$

where $l= 1, \ldots , m$. In the same manner, the right end of each $\alpha $-cut of ${\tilde{b}}_j$ is constructed, using

$$\begin{aligned} b_j^R (\alpha _i) = b_j^R (\alpha _{i+1}) + S^R (\alpha _i), \end{aligned}$$

(7)

where $S^R (\alpha _i)$ is independently drawn from the set ${\mathcal {S}}^R (\alpha _i)$, using the same uniform discrete distribution d(x). Formulas (6) and (7) mean that the new left (or right, respectively) end of $\alpha _i$-cut is constructed, based on subtracting (or adding) a random element from the set ${\mathcal {S}}^L (\alpha _i)$ (or ${\mathcal {S}}^R (\alpha _i)$) from (to) the previously generated left (right) end of the $\alpha _{i+1}$-cut. Therefore, this new fuzzy number ${\tilde{b}}_j$ is approximated using intervals for the consecutive values of $\alpha $ (from 1 at the top to 0 at the bottom).

In some sense, the fuzzy number ${\tilde{b}}_j$, obtained in this way, is similar to the LRFNs from the primary sample ${\mathcal {A}}$. The core of ${\tilde{b}}_j$ is one of the “true” cores from ${\mathcal {C}}(1)$, and its spreads are drawn from the “true” spreads belonging to ${\mathcal {S}}^L (\alpha _i)$ or ${\mathcal {S}}^R (\alpha _i)$. It is easily seen that we have

$$\begin{aligned} {\mathbb {E}} C= & {} \frac{1}{m} \sum _{l=1}^{m} a_l (1) = {\bar{a}} (1) ,\\ {\mathbb {E}} S^L (\alpha _i)= & {} \frac{1}{m} \sum _{l=1}^{m} s_l^L (\alpha _i) = {\bar{s}}^L (\alpha _i), \\ {\mathbb {E}}S^R (\alpha _i)= & {} {\bar{s}}^R (\alpha _i), \end{aligned}$$

so the expected values of the core and the spreads of ${\tilde{b}}_j$ are precisely equal to the respective means for LRFNs from ${\mathcal {A}}$. In the same way,

$$\begin{aligned} {{\mathrm{Var}}}C= & {} \frac{1}{m} \sum _{l=1}^{m} \left( a_l (1) - {\bar{a}} (1) \right) ^2 = s_{a (1)}^2, \\ {{\mathrm{Var}}}S^L (\alpha _i)= & {} s_{s^L (\alpha _i)}^2, \\ {{\mathrm{Var}}}S^R (\alpha _i)= & {} s_{s^R (\alpha _i)}^2, \end{aligned}$$

meaning that ${\tilde{b}}_j$ exactly “imitates” the statistical behavior of the samples from ${\mathcal {A}}$, without the necessity of introducing any additional knowledge about the model, which (perhaps) generates the primary sample.

Now, let us continue our example by showing how the secondary bootstrap-like sample is constructed. We will show the construction of only one element of this sample. The remaining elements are constructed in the same way.

Example 1

(Continued) The core of a new element of the secondary sample ${\mathcal {B}}$ is, in this example, randomly chosen (with equal probabilities 1 / 3) from the set $\left\{ 1, 2.5, 3.5\right\} $, and let this chosen value be equal to $b_1^L (1)=b_1^R (1)=1$. Then, we take randomly (also with equal probabilities 1 / 3) the left and right incremental spreads on the remaining $\alpha $-level. Suppose, that for $\alpha =0$ we have chosen $S_1^L (0)=1.5$, $S_1^R (0)=2.5$. Thus, the respective $\alpha $-cuts of the new element ${\tilde{b}}_1$ of the secondary sample, calculated according to (6)–(7), are defined by the following limits: $b_1^L (1)=b_1^R (1)=1,b_1^L(0)=-0.5, b_1^R(0)=3.5$ (see Fig. 4). It appears that ${\tilde{a}}_1$ is “the most similar” to ${\tilde{b}}_1$. Then, using the measures considered in Sect. 2.2, we get $m_{\infty } \left( {\tilde{a}}_1, {\tilde{b}}_1 \right) = 0.333333, m_{l_1} \left( {\tilde{a}}_1, {\tilde{b}}_1 \right) = 0.5, m_{H} \left( {\tilde{a}}_1, {\tilde{b}}_1 \right) = 0.25, m_\mathrm{TD} \left( {\tilde{a}}_1, {\tilde{b}}_1 \right) =0.694444$.

3.2 The w-method based on a mixed discrete uniform distribution w(x)

We can be also interested in an additional level of “freedom” when creating the secondary sample ${\mathcal {B}}$. It is easy to see that if ${\tilde{b}}_j$ is generated using the method described in Sect. 3.1 (i.e., using the uniform discrete distribution d(x)), its core is exactly equal to one of the values from ${\mathcal {C}}(1)$. Also its spreads are given by the respective values from the sets ${\mathcal {S}}^L (\alpha _i)$ or ${\mathcal {S}}^R (\alpha _i)$.

Yet, in some cases, creation of a more diversified sample ${\mathcal {B}}$ can be useful. Due to such diversification, the values from ${\mathcal {B}}$ could be “closer” to the (unknown) hidden model, than the samples from ${\mathcal {A}}$, especially if the number of elements in ${\mathcal {A}}$ is strictly limited. Consider, for example, the case when there are only two fuzzy numbers in ${\mathcal {A}}$, described by only two $\alpha $-cuts. The random numbers ${\tilde{b}}_j$, generated using the method described in Sect. 3.1, have no more than two possible values of a core and four possible left / right ends of its support. Moreover, if a more classical resampling method is taken into account (like the ”classical” bootstrap), then these two elements from ${\mathcal {A}}$ are repeated infinitely often during the construction of the LRFNs from ${\mathcal {B}}$. Thus, no new “knowledge” about other possible outcomes, which could be possibly “produced” by the unknown model, can be obtained.

Of course, notwithstanding the introduction of the diversification, the secondary sample ${\mathcal {B}}$ should be still enough “similar” to the primary set ${\mathcal {A}}$. If such a requirement is not fulfilled, then our knowledge resulting from ${\mathcal {B}}$ can be misleading, and our suppositions about the original source (i.e., the model of ${\mathcal {A}}$) can be incorrect. But no strict prior knowledge about the model for the primary sample was previously assumed in this paper. Therefore, the proposed generation method should be strictly nonparametric, without any additional and more detailed assumptions.

In statistics, if we do not want to introduce any prior knowledge, we have to use the so-called non-informative probability distributions. A commonly used model of such a distribution is a uniform density for an interval [c, d], denoted further by U([c, d]). We will use this density in the construction of the probability distribution used for generation purposes.

3.2.1 The w(x) distribution, and its properties

Let

$$\begin{aligned} x_1< x_2< \cdots < x_m \end{aligned}$$

(8)

be a strictly increasing sequence of m values, without their repetitions. We propose a density w(x), which is the composition of a discrete random distribution, and a continuous probability density, given by the following formula

$$\begin{aligned} w(x)= & {} \frac{1}{2m} \delta _x (x_1) + \frac{1}{m} w_{1,2} (x) + \frac{1}{m} w_{2,3} (x) + \cdots \nonumber \\&+\, \frac{1}{m} w_{m-1,m} (x) + \frac{1}{2m} \delta _x (x_m), \end{aligned}$$

(9)

where

$$\begin{aligned} w_{l-1,l} (x) = \frac{1}{x_l - x_{l-1}} \mathbb {1} (x \in [x_{l-1}, x_{l}]), \end{aligned}$$

(10)

and $\delta _x (.)$ is the Dirac measure. If $X \sim w(x)$, where w(x) is given by (9), then $X = x_1$ or $X = x_m$ is taken with an atomic probability $\frac{1}{2m}$. Hence, the first value $x_1$ or the last one $x_m$ from the sequence (8) is selected with equal probabilities. Otherwise, one of the intervals $[x_{l-1}, x_{l}]$, for $l =1, \ldots , m-1$, is designated with atomic probability $\frac{1}{m}$. When such a single interval is selected, we have $X \sim w_{l-1,l} (x)$, so the output x is generated using the uniform density $U([x_{l-1}, x_{l}])$, which is described by (10).

Therefore, w(x) can be seen as a certain generalization of the discrete distribution, discussed in Sect. 3.1. The pdf w(x) also generates values from the same interval $[x_1, x_m]$, but they are more diversified—apart from the values directly equal to the ones from the sequence (8), all $x \in [x_1, x_m]$ can now be obtained.

Statistical characterizations of the density w(x) are summarized in the following lemma:

Lemma 1

Let $X \sim w(x)$, where w(x) is a pdf described by (9) and (10). Then

$$\begin{aligned} {\mathbb {E}} X = \frac{1}{m} \sum _{i=1}^{m} x_i = {\bar{x}}, \end{aligned}$$

and

$$\begin{aligned} {{\mathrm{Var}}}X= & {} \frac{1}{m} \left( \frac{5}{6} x_1^2 + \frac{1}{3} x_1 x_2 + \frac{2}{3} x_2^2 + \frac{1}{3} x_2 x_3 + \cdots \right. \\&\left. + \frac{1}{3} x_{m-1} x_m + \frac{5}{6} x_m^2 \right) \\&-\, \left( {\bar{x}}\right) ^2 = s_w^2. \end{aligned}$$

Proof

From (9) and (10), we have

$$\begin{aligned} {\mathbb {E}} X = \frac{1}{2m} x_1 + \frac{1}{2m} x_m + \frac{1}{m}\sum _{i=1}^{m-1} \frac{x_i + x_{i+1}}{2} \end{aligned}$$

and

$$\begin{aligned}&{{\mathrm{Var}}}X = \frac{1}{2m} x_1^2 + \frac{1}{2m} x_m^2 \\&\quad + \frac{1}{m}\sum _{i=1}^{m-1} \frac{x_i^2 + x_{i} x_{i+1} +x_{i+1}^2}{3} - \left( {\bar{x}}\right) ^2, \end{aligned}$$

which concludes the proof. $\square $

From Lemma 1 we see that if $X \sim w(x)$, then the expected value of X is precisely equal to its mean ${\bar{x}}$. But the variance $s_w^2$ of X is not equal to the classical estimator, i.e., the standard sample variance $s^2$. The difference between the variances $s_w^2$ and $s^2$ can be important for the intended diversity of the LRFNs in the second sample ${\mathcal {B}}$. We have

$$\begin{aligned}&s_w^2 - s^2 = \frac{1}{m}\\&\quad \left( -\frac{1}{6} x_1^2 -\frac{1}{6} x_m^2 + \frac{1}{3} x_1 x_2 + \frac{1}{2} \sum _{i=2}^{m-1} x_i \left( x_{i+1} - x_i\right) \right) , \end{aligned}$$

which leads to the following remark:

Remark 1

If $m \rightarrow \infty $ and $x_i >0$, then $s_w^2 - s^2 \ge 0$. Therefore, the variability (measured by variance) of $X \sim w(x)$ is not lesser than the variability of $X \sim d(x)$, if only the size of a sample is large enough, and all $x_i >0$.

3.2.2 Generation procedure

Now, to generate a fuzzy number ${\tilde{b}}_j \in {\mathcal {B}}$, if $j=1, \ldots , n$, instead of the discrete distribution d(x), the previously introduced density w(x) is used (see also Algorithm 3). However, an overall procedure of the construction of ${\tilde{b}}_j$ is similar to the previous case, which is described in Sect. 3.1.

During the first step, the value of a core $b_j (1) = C$ is drawn, using the distribution w(x), based on the elements from the set ${\mathcal {C}} (1)$, so $C \sim w(x)$, where $x \in {\mathcal {C}} (1)$. Next, consecutive $\alpha $-cuts of the given ${\tilde{b}}_j$ are calculated, starting from the value $\alpha _{k-1}$ and ending at $\alpha _0 = 0$. For each $\alpha _i$, a value of the left end of ${\tilde{b}}_j (\alpha _i)$ is equal to (6), where $S^L (\alpha _i)$ is an independently drawn random value from the set ${\mathcal {S}}^L (\alpha _i)$, using the distribution w(x) for the set ${\mathcal {S}}^L (\alpha _i)$. In the same way, the right end of ${\tilde{b}}_j (\alpha _i)$ is given by (7), where $S^R (\alpha _i)$ is independently drawn from the set ${\mathcal {S}}^R (\alpha _i)$, using the respective distribution w(x) for this set.

Let us continue our example using the method of generation, described in this subsection. We will show the construction of only one element of this sample. The remaining elements are constructed in the same way.

Example 1

(Continued) Let us start from the generation of the core of a new fuzzy number ${\tilde{b}}_1$. According to the density function w(x) defined by (9), there are four possibilities for choosing this value: take 1 with probability 1 / 6, take a randomly chosen (using the uniform distribution) number from the interval [1, 2.5] with probability 1 / 3, take a randomly chosen number from the interval [2.5, 3.5] with probability 1 / 3, take 3.5 with probability 1 / 6 (see also Fig. 5). Suppose that the second option has been chosen, and a new core has been set to $b_1 (1)=1.75$. Now, consider the left and right spreads for $\alpha =0$. For choosing the value of the left incremental spread there are also 4 possibilities: take 1 with probability 1 / 6, take a randomly chosen number from the interval [1, 1.5] with probability 1 / 3, take a randomly chosen number from the interval [1.5, 2.5] with probability 1 / 3, take 2.5 with probability 1 / 6. Suppose that the first option has been chosen, and a new left incremental spread has been set to $S_1^L (0)=1$. Similarly, for choosing the value of the right incremental spread there are 4 possibilities: take 1.5 with probability 1 / 6, take a randomly chosen number from the interval [1.5, 2] with probability 1 / 3, take a randomly chosen number from the interval [2, 2.5] with probability 1 / 3, take 2.5 with probability 1 / 6. Suppose that the third option has been chosen, and a new right incremental spread has been set to $S_1^R (0)=2.1$. Finally, the new generated element of the secondary sample is the fuzzy number defined by its core $b_1 (1)=1.75$, and its $\alpha $-cut is defined by the respective limits $b_1^L(0)=0.75, b_1^R(0)=3.85$ (see Fig. 6). This new element ${\tilde{b}}_1$ seems to be similar to both ${\tilde{a}}_1$ and ${\tilde{a}}_2$. But, using the considered measures, we can compare the pairs ${\tilde{a}}_1,{\tilde{b}}_1$ and ${\tilde{a}}_2,{\tilde{b}}_1$, and try to decide, which of the fuzzy numbers from the initial sample is more similar to ${\tilde{b}}_1$. We get $m_{\infty } \left( {\tilde{a}}_1, {\tilde{b}}_1 \right) = 0.75, m_{l_1} \left( {\tilde{a}}_1, {\tilde{b}}_1 \right) = 1.3625, m_{H} \left( {\tilde{a}}_1, {\tilde{b}}_1 \right) = 0.8, m_\mathrm{TD} \left( {\tilde{a}}_1, {\tilde{b}}_1 \right) =1.11778$ and $m_{\infty } \left( {\tilde{a}}_2, {\tilde{b}}_1 \right) = 0.5, m_{l_1} \left( {\tilde{a}}_2, {\tilde{b}}_1 \right) = 1.29375, m_{H} \left( {\tilde{a}}_2, {\tilde{b}}_1 \right) = 0.95, m_\mathrm{TD} \left( {\tilde{a}}_2, {\tilde{b}}_1 \right) =1.23722$. Then, if we choose the supremum or the TD measure, we conclude that the fuzzy numbers ${\tilde{a}}_1,{\tilde{b}}_1$ are the most similar. If, instead, we use $m_{l_1}$ or $m_{H}$, the triangular fuzzy numbers ${\tilde{a}}_2,{\tilde{b}}_1$ are the most similar.

4 Properties of bootstrap-like secondary samples

After the introduction of both methods, we can numerically compare secondary samples, which are generated using these methods. Moreover, we also apply a classical bootstrap in order to verify whether there are any significant differences between this widely used simulation method (see, e.g., González-Rodríguez et al. (2006), Hung (2006), Montenegro et al. (2004) and Ramos-Guajardo and Lubiano (2012) for a more detailed discussion) and the algorithms proposed in this paper.

Let us start from a certain population ${\mathcal {P}}_{n_0}$, which consists of $n_0$ LRFNs. From this population, we randomly draw m elements. Let these elements constitute a primary sample ${\mathcal {A}}_m$. Afterwards, using the fuzzy numbers from this primary sample, three methods (i.e., the classical bootstrap, the d-method, and the w-method) are used to generate the secondary sample ${\mathcal {B}}_n$, which consists of n elements.

In our numerical experiments, different settings are used: a moderate population ${\mathcal {P}}_{100}$ (for which $n_0 = 100$) together with a small primary sample ${\mathcal {A}}_5$ (where $m=5$) and a moderate secondary sample ${\mathcal {B}}_{100}$ (where $n=100$), and a bigger population ${\mathcal {P}}_{200}$ with a moderate primary sample ${\mathcal {A}}_{100}$ and a rather big secondary sample ${\mathcal {B}}_{200}$. This allows us to compare outcomes for the classical bootstrap, the d-method, and the w-method, for the cases when preliminary information about a model (which is available only via the analysis of the primary sample) is very sparse (in the case of ${\mathcal {A}}_5$) or relatively abundant (for ${\mathcal {A}}_{100}$).

For simplicity, only triangular fuzzy numbers will be considered, i.e., only two $\alpha $-cuts (where $\alpha _0 = 0$ and $\alpha _1 = 1$) are used to construct the whole LRFN. Actually, such simple types of fuzzy numbers are frequently used by practitioners. However, both the d-method and the w-method can be easily used to generate the second sample, even if more $\alpha $-cuts are considered.

In the following numerical experiments, two types of triangular numbers are considered as a model for the population ${\mathcal {P}}_{n_0}$. The first one (which is further referred to as the “type A number”) is a fuzzy number with an expected symmetrical spread, where the center is random and has the standard normal distribution N(0, 1), and the semiwidths of the support are given as independent Chi-square variables with one degree of freedom. A similar LRFN is discussed in a detailed way in Colubi et al. (2002). The second kind (the “type B number”) of a fuzzy number has a strictly non-symmetrical shape. In this case, the center points are described by the gamma distribution with the shape parameter 1 and the scale parameter 2, and the semiwidths of the support being drawn from independent exponential distributions with parameter 1 (for the left spread) or 2 (for the right spread).

We are interested in the analysis of mutual relations between the primary and secondary samples for the different generation procedures and the mentioned types of LRFNs. Therefore, properties of both the primary sample and the generated, secondary set, are statistically summarized using the sample mean versus the population mean, which are calculated for the support and the center of fuzzy numbers. From the statistical point of view, also variability of the simulated fuzzy numbers is very important. Thus, the standard deviation is also found for the support and the center of LRFN in the case of the second (i.e., generated) sample.

Moreover, the simulated fuzzy numbers should give us some additional “insight” into the model, which is (generally) completely unknown and “hidden” in the data from the primary sample. In an ideal situation, LRFNs from the second sample should be (in some way) “similar” to the numbers from the primary sample, but, simultaneously, not exactly “the same” as the elements from ${\mathcal {A}}_m$, and also “very close” to the population. Therefore, values of some measures (see Sect. 2) are evaluated for each possible pair of fuzzy numbers. These pairs consist of one “old” LRFN (i.e., from ${\mathcal {A}}$), and one “new”, generated fuzzy number (i.e., an element from ${\mathcal {B}}$). The obtained measure values are also summarized using common tools, like minimum, maximum, mean, and standard deviation. Afterwards, we can conjecture whether some generation method produces fuzzy numbers which are “the same as”, “similar” or only “close” (and to what extent) to the LRFNs from the primary sample.

4.1 Small primary sample, type A fuzzy number

Based on the small sample ${\mathcal {A}}_5$ of the type A of triangular fuzzy numbers, three moderate secondary samples ${\mathcal {B}}_{100}$ were generated, using the classical bootstrap, the d-method, and the w-method. Then, the means for the core ${\bar{X}}^*_C$ (see Fig. 7), the left end of the support ${\bar{X}}^*_L$ (see Fig. 8), and the right end ${\bar{X}}^*_R$ (see Fig. 9) for each of the simulated samples were calculated. From now on, the results obtained with the bootstrap are marked by circles in the graphs, with the d-method—by diamonds, and with the w-method—by squares. Horizontal bold lines correspond to the means of the primary sample ${\mathcal {A}}$ for the core ${\bar{X}}^{\mathcal {A}}_C$, the left end of the support ${\bar{X}}^{\mathcal {A}}_L$ and the right end ${\bar{X}}^{\mathcal {A}}_R$, and axes of respective graphs start exactly in the means for the population (for the core ${\bar{X}}_C$, for the left end of the support ${\bar{X}}_L$ and for the right end ${\bar{X}}_R$).

As it is seen, each of the simulation methods behaves generally well. In each case, after generation of 30—40 fuzzy numbers, the mean of the secondary sample ${\bar{X}}^*$ approaches the respective mean of the primary set ${\bar{X}}^{\mathcal {A}}$. Moreover, applications of the d-method or the w-method seem to have some advantages, when they are compared to the classical bootstrap. For example, the means for these two approaches are, in general, closer to ${\bar{X}}^{\mathcal {A}}$ (i.e., the mean of ${\mathcal {A}}_5$), than in the case of the bootstrap. The respective graphs are also much smoother. Surprisingly, though, in the case of the w-method, the respective mean is also closer to “the real” result—the mean of our unknown model, i.e., the population ${\mathcal {P}}_{100}$.

Apart from the comparison of the means, variability of the generated LRFNs should be also considered. Hence, standard deviations for the core (see Fig. 10), the left end of the support (see Fig. 11) and its right end (see Fig. 12) are plotted. These graphs are marked in the same way as the previous ones. In each of these cases, the standard deviation of the secondary sample is the lowest when the w-method is used.

Now, we compare the three secondary samples, which are generated using the considered simulation procedures, but with the help of the measures, which were recalled in Sect. 2. Let us assume that $l ({\tilde{a}}_i, {\tilde{b}}_j)$ is a value of some measure of similarity l(., .) between LRFNs ${\tilde{a}}_i \in {\mathcal {A}}$ and ${\tilde{b}}_j \in {\mathcal {B}}$. Then, the following notation is used

$$\begin{aligned} {{\mathrm{MinMin}}}= & {} \min _j \{ \min _i l ({\tilde{a}}_i, {\tilde{b}}_j) \},\\ {{\mathrm{MinMax}}}&= \min _j \{ \max _i l ({\tilde{a}}_i, {\tilde{b}}_j) \}, \\ {{\mathrm{MaxMin}}}= & {} \max _j \{ \min _i l ({\tilde{a}}_i, {\tilde{b}}_j) \},\\ {{\mathrm{MaxMax}}}&= \max _j \{ \max _i l ({\tilde{a}}_i, {\tilde{b}}_j) \}, \\ {{\mathrm{MeanMin}}}= & {} \frac{1}{n} \sum _{j=1}^{n} \min _i l ({\tilde{a}}_i, {\tilde{b}}_j), \\ {{\mathrm{MeanMax}}}&= \frac{1}{n} \sum _{j=1}^{n} \max _i l ({\tilde{a}}_i, {\tilde{b}}_j), \\ {{\mathrm{StDevMin}}}= & {} \frac{1}{n} \sum _{j=1}^{n} \left( \min _i l ({\tilde{a}}_i, {\tilde{b}}_j) - {{\mathrm{MeanMin}}}\right) ^2, \\ {{\mathrm{StDevMax}}}= & {} \frac{1}{n} \sum _{j=1}^{n} \left( \min _i l ({\tilde{a}}_i, {\tilde{b}}_j) - {{\mathrm{MeanMax}}}\right) ^2. \end{aligned}$$

The respective measures of similarity are summarized in Table 1 (when ${\mathcal {B}}_{100}$ is simulated using the bootstrap approach), Table 2 (in the case of the d-method) and Table 3 (for the w-method). Of course, the bootstrap only repeats fuzzy numbers, which are already present in the primary sample. Therefore, ${{\mathrm{MinMin}}}$ and ${{\mathrm{MinMax}}}$ values for the measures $m_{l_1}$, $m_{\infty }$ and $m_{H}$ are strictly equal to zero. But in the case of d-method, even the values of these measures are more diversified, so we have ${{\mathrm{StDevMin}}}>0$. The same applies for the w-method. Therefore, these two methods produce LRFNs, which are more diversified (and “not exactly the same” in some way, too) than the numbers from ${\mathcal {A}}_5$. However, the generated LRFNs are also “similar” (in the sense of the applied measures of similarity) to the fuzzy numbers from the primary sample, because the obtained ${{\mathrm{MinMin}}}$ and ${{\mathrm{MeanMin}}}$ values are very close to zero. It seems that using the w-method is more promising than the d-method, because ${{\mathrm{MinMax}}}, {{\mathrm{MaxMax}}}$, and ${{\mathrm{MeanMax}}}$ values are generally lesser for this first approach, and ${{\mathrm{MeanMin}}}$ values are very similar. Hence, even LRFNs, which are “maximally” distant from the fuzzy numbers from the primary sample, are “closer” in the case of the w-method than for the d-method.

Table 1 Small primary sample, type A fuzzy number: values of measures for the bootstrap

Full size table

Table 2 Small primary sample, type A fuzzy number: values of measures for the d-method

Full size table

Table 3 Small primary sample, type A fuzzy number: values of measures for the w-method

Full size table

Table 4 Small primary sample, type A fuzzy number: minimal measure values for the comparisons with the independent sample ${\mathcal {T}}_{200}$

Full size table

Let us analyze the observed similarity in another way. In order to do this, an additional independent sample ${\mathcal {T}}_{200}$, which consists of 200 fuzzy numbers of type A, was generated. Then, three secondary sets ${\mathcal {B}}_{200}$ are sampled based on ${\mathcal {A}}_5$, using the bootstrap, the d-method, and the w-method. We find an LRFN from each of ${\mathcal {B}}_{200}$, which is the nearest to some fuzzy number from ${\mathcal {T}}_{200}$ in the sense of one of the measures $m_{l_1},m_{\infty }, m_\mathrm{TD}, m_{H}$, i.e., a value

$$\begin{aligned} {{\mathrm{MinMin}}}= \min _j \{ \min _i l ({\tilde{t}}_i, {\tilde{b}}_j) \}, \end{aligned}$$

where ${\tilde{t}}_i \in {\mathcal {T}}_{200}$ and ${\tilde{b}}_j \in {\mathcal {B}}_{200}$, is calculated. The obtained minimal values of these measures for respective pairs of LRFNs are given in Table 4, and for each of the measure its minimum appears in boldface. As it is seen, if the w-method is used, then the generated fuzzy number is the most similar to some element from ${\mathcal {T}}_{200}$. In some way, this new independent sample ${\mathcal {T}}_{200}$ gives an additional insight into the “true model”, because it is a supplementary sample from the unknown source, which models our LRFNs. Therefore, the w-method produces fuzzy numbers, which are the nearest to this model in the considered case. Note, that because the bootstrap only repeats elements from the primary sample, then for this method the obtained values of the measures are even 6–7 times bigger than for the best match.

4.2 Small primary sample, type B fuzzy number

Now we analyze the three considered simulation procedures for the case when the small primary sample ${\mathcal {A}}_5$ consists of the strictly non-symmetrical triangular fuzzy numbers (i.e., the previously mentioned LRFNs of “type B”). The graphs of the means (for the core—see Fig. 13, for the left end of the support—see Fig. 14, for the right end of the support—see Fig. 15) are very similar to the case, which was described in Sect. 4.1. Once again, these means for the d-method and the w-method are, in general, closer to the respective means of the primary sample, than in the case of the bootstrap. Their graphs are also very smooth. Moreover, the plots of the standard deviations behave reasonably well (for the core—see Fig. 16, for the left end of the support—see Fig. 17, for the right end of the support—see Fig. 18). The obtained values are the lowest when the w-method is used.

Also the characteristics of the similarity measures, which were introduced in Sect. 4.1, can be found in this case (see Tables 5, 6, 7 for the respective summaries for the different simulation approaches). In general, the conclusions are similar as in the case of type A fuzzy numbers, i.e., the bootstrap only repeats LRFNs from the primary sample, and the d-method and the w-method produce a more diversified output, which is still similar (in the sense of the considered measures) to the values from ${\mathcal {A}}_5$. But the decision as to whether the d-method or the w-method is better suited, when the “maximum” distance criterion is taken into account, is not so straightforward now. As it is seen, ${{\mathrm{MinMax}}}$ values are lower for the d-method, but ${{\mathrm{MaxMax}}}$ and ${{\mathrm{MeanMax}}}$ are lower in the case of the w-method.

Table 5 Small primary sample, type B fuzzy number: values of measures for the bootstrap

Full size table

Table 6 Small primary sample, type B fuzzy number: values of measures for the d-method

Full size table

Table 7 Small primary sample, type B fuzzy number: values of measures for the w-method

Full size table

And once again, we analyze the supplementary, independent sample ${\mathcal {T}}_{200}$ of LRFNs of type B. The fuzzy numbers from the set ${\mathcal {T}}_{200}$ are compared with three samples ${\mathcal {B}}_{200}$, which were generated using the classical bootstrap and the two methods introduced in this paper. As in Sect. 4.1, LRFNs from ${\mathcal {T}}_{200}$ and each of the sets ${\mathcal {B}}_{200}$ are compared in order to find pairs of fuzzy numbers, which are the most similar. The obtained minimal values of measures can be found in Table 8. Also in this case, the w-method generates fuzzy numbers, which are the most similar to some element from the set ${\mathcal {T}}_{200}$, apart from the measure $m_{\infty }$, for which the d-method gives the best result. The classical bootstrap gives values, which are even 2–3 times bigger than the best matches.

Table 8 Small primary sample, type B fuzzy number: minimal measure values for the comparison with the independent sample ${\mathcal {T}}_{200}$

Full size table

4.3 Moderate primary sample

In practical situations, apart from small statistical samples, which consist of only few values, larger samples are also used. Therefore, we also analyze the behavior of a moderate primary sample, for which $m=100$ (i.e., ${\mathcal {A}}_{100}$), and the corresponding simulated secondary sample ${\mathcal {B}}_{200}$, which is rather a big one, especially when compared to the previous examples (now we have $n=200$). As it turns out, general conclusions for both type A and type B of LRFNs are very similar to the outcomes for the small sample, which were summarized in Sects. 4.1 and 4.2. Hence, we omit a more detailed discussion, in order to present another, but, in some way, supplementary approach.

Up till now, we have analyzed the speed of convergence of the mean of the secondary sample ${\bar{X}}^*$ to the “true” (but, in general, unknown) mean of the population ${\bar{X}}$. And, in our reasoning, three “focal points” (a core, a left and a right end of a support) have been taken into account. In Colubi et al. (2002) the authors consider an application of LIL (the law of iterated logarithm) as a tool for a convergence diagnosis for the simulated fuzzy numbers. Therefore, we will also analyze the behavior of distance between $\sqrt{n} / \sqrt{2 n \log \log n} {\bar{X}}^*$ and ${\bar{X}}$ as a function of the secondary sample size n. To keep consistency with our previous analysis, the three mentioned “focal points” will be still in the center of our attention. Hence, the distance for the core

$$\begin{aligned} \frac{\sqrt{n}}{\sqrt{2 n \log \log n}} \vert {\bar{X}}^*_C - {\bar{X}} \vert , \end{aligned}$$

(11)

and similarly defined measures for the left and right ends of a support, will be further used, instead of the supremum distance, which is considered in Colubi et al. (2002).

Because the secondary sample ${\mathcal {B}}_{200}$ is rather big, the convergence speed for (11) and the rest of similar measures, as functions of n, is now more visible. We restrict our analysis only to type A fuzzy numbers, but the obtained conclusions are also similar for type B. The calculated distances as functions of the secondary sample size are plotted in Figs. 19 (the core), 20 (the left end of the support) and 21 (the right end of the support). As it is seen, the bootstrap approach is the worst one, especially for larger values of n, because the obtained distances are, in general, bigger for this simulation method. Both the d-method and the w-method produce the relatively well behaving output.

5 New bootstrap-like sample as a tool in statistical tests

Apart from the statistical properties of the simulated LRFNs, the possibility of applying the proposed methods in practical statistical cases was also investigated. Two types of tests for the expected value of the fuzzy numbers were considered (see Sect. 2.3 for additional details and notation) as a respective example.

The first one is a bootstrapped version of the test proposed in Körner (2000) (see Corollary 1). From now on, it will be called the K test (from its author’s name) for the expected value. The second test is the procedure developed in González-Rodríguez et al. (2006) and Montenegro et al. (2004) (see Corollary 2). It will be called the GRMCG-test for the expected value (also based on the authors’ names). In this case, we apply the standard uniform density as the weight normalized measure $\varphi $ in $D^{\varphi }_W({\tilde{a}}, {\tilde{b}})$ metric (2) (see Bertoluzza et al. (1995) and Montenegro et al. (2004) for additional details and other approaches).

As an initial sample in each of these tests, three types of triangular fuzzy numbers are simulated. Types A and B are described here in Sect. 4. Type C, which was considered in Körner (2000), is a fuzzy number, in which the random center has the standard normal distribution N(0, 1), and the spreads of the support are independently drawn from the standard uniform distribution U([0, 1]).

For each of these types of fuzzy numbers, three different simulation procedures (the classical bootstrap, the d-method and the w-method) are used to generate an input random sample for the test. The number of elements n in such a sample is varied, so that both small and medium sample sizes are considered, i.e., we set $n=5,10,30,100$. Also, a few values of the number of bootstrap replications r (namely $r=100,200,1000$) are used to generate the respective bootstrapped distribution of the test statistics. In this way, we investigate the possible influence of this parameter. In each of these experiments, the whole resampling procedure is iterated 100,000 times (see, e.g., Gil et al. (2006b), González-Rodríguez et al. (2006), Montenegro et al. (2004) and Ramos-Guajardo and Lubiano (2012) for additional details of a similar approach).

Based on the respective statistics, in each of the tests of the expected value, an empirical percentage of rejections ${\hat{p}}$ at the nominal significance level $p=0.05$ for the true null hypothesis is then computed. This estimated value is widely used as a benchmarking tool for the bootstrapped version of the statistical tests (see, e.g., Gil et al. 2006b; González-Rodríguez et al. 2006; Montenegro et al. 2004; Ramos-Guajardo et al. 2010; Ramos-Guajardo and Lubiano 2012). The three considered simulation procedures can then be directly compared.

In general, the simulated values of ${\hat{p}}$ for all of the approaches are very close to one another, and the overall properties are very similar. In particular, the empirical percentages of rejections converge to one another for larger values of n and r (like $n=100$ and $r=1000$). However, there are also some significant differences. In order to emphasize them, for each experiment the value of ${\hat{p}}$, which is the nearest to the true value of the significance level p, is given in boldface.

Let us start from the K test of the expected value. As it is seen for the fuzzy numbers of type A (see Table 9), type B (see Table 10) and type C (see Table 11), a comparison of the simulation approaches seems to be quite simple. In each of these cases, the d-method leads to ${\hat{p}}$, which is the nearest to the assumed significance level p, apart from a few exceptions. For all of these exceptions, the classical bootstrap approach gives the most “true” answer. But even in these cases, the differences between the empirical percentages of rejections for the d-method and the classical bootstrap are not very significant (about 0.001–0.002). And these differences favor the d-method especially for smaller values of n and r. Altogether, the classical bootstrap occupies the second place with respect to the measure of proximity between ${\hat{p}}$ and p.

Table 9 Simulated values of ${\hat{p}}$ for the K test, type A of LRFNs

Full size table

Table 10 Simulated values of ${\hat{p}}$ for the K test, type B of LRFNs

Full size table

Table 11 Simulated values of ${\hat{p}}$ for the K test, type C of LRFNs

Full size table

For the GRMCG-test, the analysis of differences between ${\hat{p}}$ and p is not so straightforward. In the case of type A of fuzzy numbers (see Table 12), ${\hat{p}}$ seems to be the nearest to the true significance level for the w-method, when a smaller number, like $n = 5, 10$, of the elements in the initial sample is taken into account, or for the d-method—for larger values $n=30,100$. Especially for the small samples, the classical bootstrap approach gives the worst answers and the differences between the bootstrap and one of the other approaches are quite important (about 0.008–0.01).

However, when type B fuzzy numbers are analyzed (see Table 13), the picture is not quite clear. Firstly, for $n = 5, 10$, the estimated percentages of rejections favor the classical bootstrap approach, because the other approaches give bigger values of ${\hat{p}}$. In these cases, the differences between the classical bootstrap and other simulation methods are quite distinct (even equal to 0.012–0.015). Secondly, for $n = 30, 100$, the outputs are more accurate if the d-method or the w-method are used. Then, the differences among various simulated ${\hat{p}}$ are quite small (about 0.001–0.002).

In the case of type C fuzzy numbers (see Table 14), it seems that the d-method or the w-method produce the most accurate estimators of ${\hat{p}}$. This can be seen especially for the smaller samples ($n = 5, 10$), when the classical bootstrap approach gives an estimator of the rejection rate, which is by about 0.004 smaller than for the other methods. For the largest sample ($n=100$), the d-method is favored, but once again, the differences among the simulated values of ${\hat{p}}$ are quite small.

Table 12 Simulated values of ${\hat{p}}$ for the GRMCG-test, type A of LRFNs

Full size table

Table 13 Simulated values of ${\hat{p}}$ for the GRMCG-test, type B of LRFNs

Full size table

Table 14 Simulated values of ${\hat{p}}$ for the GRMCG-test, type C of LRFNs

Full size table

Taking into account the whole analysis, it is not possible to point out the undoubtedly best simulation procedure, which gives the most accurate values of ${\hat{p}}$. However, application of the d-method or the w-method looks promising, especially for smaller initial samples.

6 Conclusions

In this paper, we propose two simulation algorithms for the generation of sets, consisting of LRFNs, namely the d-method and the w-method. Both of these algorithms are based on the resampling paradigm and utilize the primary sample of fuzzy numbers in order to randomly generate the secondary bootstrap-like sample. This generation is based on $\alpha $-cuts of LRFNs and a strictly nonparametric approach, without necessity of making additional assumptions about the source (or the model) of the primary sample.

Our contribution in this article is fourfold. Firstly, two new numerical algorithms for the simulation of samples of the LRFNs are considered. These algorithms, similarly as the classical bootstrap methods, utilize a primary (initial) sample of random fuzzy numbers in order to generate secondary (bootstrap) fuzzy random samples. But, contrary to the classical bootstrap, these simulated secondary sets consist of values, which are “not exactly the same” as in the initial sample. In the first method, the modified direct method (called the d-method and described by a discrete probability distribution d(x)), information about the $\alpha $-cuts of the LRFNs from the primary set is used. In the second method (called the w-method), a mixed discrete uniform probability distribution w(x) is used for generation purposes. In this approach, the information about the $\alpha $-cuts of the observations from the primary sample is modified in a certain way, using a non-informative uniform distribution. Both of the proposed methods are used to generate the sets of LRFNs, whose diversity is in a certain sense greater than the diversity of observations from the primary sample. However, this greater diversity has been achieved without incorporation of any additional and specific assumptions about the general probability model for the initial population. Hence, both of these approaches are strictly nonparametric ones.

Secondly, the outputs for these two methods are analyzed, using the most important statistical measures. For both small and moderate primary samples, and two types of triangular fuzzy numbers, we check if the generated secondary (bootstrap-like) samples imitate well the statistical behavior of the initial population. In order to do this, the mean and the standard deviation are calculated, and applicability of the strong law of large numbers and of the law of iterated logarithm have been confirmed. We also compare the simulated secondary samples for the two proposed methods with the output of the classical bootstrap approach. It seems that the application of d(x) and w(x) distributions in bootstrapping is very promising, because the generated triangular numbers “mimic” the values from the initial sample very well. Moreover, if the previously mentioned statistical measures are taken into account, these generated values sometimes behave even better than in the case of the classical bootstrap approach applied to the same primary samples.

Thirdly, for the same sizes of samples, and two types of triangular fuzzy numbers, we check whether the simulated values are “close enough” to the fuzzy numbers from the initial set. The level of this proximity is measured using four types of measures (the supremum measure, the $l_1$ metric, the Hausdorff distance extended to the metric, and the measure proposed by Tran and Duckstein in Tran and Duckstein (2002)). Once again, the obtained results have been compared with the outcomes for the classical bootstrap approach. The analysis performed confirms the proposition that fuzzy numbers, generated using d(x) and w(x) distributions are very close to observations from the primary sample. Therefore, the two simulation procedures, introduced in this paper, can be used to form the secondary (bootstrap-like) sample, which is “similar”, but also, in some way, different, in comparison with the initial set of observations.

And finally, we check whether these two new simulation algorithms can be successfully applied for solving some practical statistical problems. As an example, we have applied the d-method and the w-method in two statistical tests about the mean value of a population of fuzzy numbers. In these two tests, the outputs for both small and moderate primary samples have been analyzed for three types of triangular fuzzy numbers. As previously, we have compared three simulation procedures (the classical bootstrap and two methods introduced by us). In all considered cases, the difference between the nominal significance level of the test and the empirical percentage of rejections of the true null hypothesis is used as a benchmark. Once again, the algorithms introduced in this paper show their promising potential, because the difference mentioned is usually lower for the proposed bootstrap-like procedures, based on the d(x) or w(x) distributions, than for the classical bootstrap of fuzzy random variables.

The proposed methods have, in comparison with the classical bootstrap, one disadvantage, appearing when considered fuzzy numbers have their “natural” limits (e.g., when their supports must contain only nonnegative numbers). In such a case, it may happen that some of the generated elements of the secondary (bootstrap-like) sample may not fulfill such requirements. In such a case, one can introduce certain modifications of the proposed method (e.g., a simple curtailment) in order to eliminate such “unnatural” observations. However, the consequences of such modifications are yet not known and require consideration in the future research.

It should be noted that fuzzy sets introduced by Zadeh are still the most popular tool used for modeling non-random uncertainty (imprecision). There exist many extensions of fuzzy sets which can also be used for this purpose. For example, interval-valued fuzzy sets (IVFS), introduced independently by four different authors in 1975—see Nowak and Hryniewicz (2018) for references, can be used in situations, when the membership function of a fuzzy set cannot be precisely defined. Another very popular extension, widely known under the name of intuitionistic fuzzy sets (IFS), was introduced in Atanassov (1986) and can be used when we describe imprecision in terms of membership and non-membership functions. Many of these different methods are interrelated or even formally equivalent (see, e.g., Deschrijver and Kerre 2003). Probabilistic models for IVFS and IFS variables have been already proposed in the literature. However, statistical methods for the analysis of such imprecise data practically do not exist. Notable exceptions (Akbari and Arefi 2013; Hesamian and Akbari 2017) are devoted to the analysis of IFS random data. Complex description of IVFS and IFS data makes inferential statistical procedures for these types of data extremely difficult. Therefore, simulation methods, considered in this paper, after some necessary modifications, could be applied in the statistical analysis of such data.

References

Akbari MG, Arefi M (2013) Statistical nonparametric test based on the intuitionistic fuzzy data. J Intell Fuzzy Syst 25(3):525–534
MathSciNet MATH Google Scholar
Atanassov K (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20(1):87–96
Article MathSciNet MATH Google Scholar
Bertoluzza C, Corral N, Salas A (1995) On a new class of distances between fuzzy numbers. Mathw Soft Comput 2:71–84
MathSciNet MATH Google Scholar
Colubi A, Fernández-García C, Gil MA (2002) Simulation of random fuzzy variables: an empirical approach to statistical/probabilistic studies with fuzzy experimental data. IEEE Trans Fuzzy Syst 10(3):384–390
Article Google Scholar
Couso I, Dubois D, Sánchez L (2014) Random sets and random fuzzy sets as ill-perceived random variables. Springer briefs in applied sciences and technology. Springer, Cham
Book MATH Google Scholar
Deschrijver G, Kerre E (2003) On the relationship between some extensions of fuzzy set theory. Fuzzy Sets Syst 133(2):227–235
Article MathSciNet MATH Google Scholar
Gil MA, Hryniewicz O (2009) Statistics with imprecise data. In: Meyers RE (ed) Encyclopedia of complexity and systems science. Springer, New York, pp 8679–8690
Google Scholar
Gil MA, López-Díaz M, Ralescu DA (2006a) Overview on the development of fuzzy random variables. Fuzzy Sets Syst 157(19):2546–2557
Article MathSciNet MATH Google Scholar
Gil MA, Montenegro M, González-Rodríguez G, Colubi A, Casals MR (2006b) Bootstrap approach to the multi-sample test of means with imprecise data. Comput Stat Data Anal 51:148–162
Article MathSciNet MATH Google Scholar
González-Rodríguez G, Montenegro M, Colubi A, Gil MA (2006) Bootstrap techniques and fuzzy random variables: synergy in hypothesis testing with fuzzy data. Fuzzy Sets Syst 157(19):2608–2613
Article MathSciNet MATH Google Scholar
González-Rodríguez G, Colubi A, Trutschnig W (2009) Simulation of fuzzy random variables. Inf Sci 179(5):642–653
Article MathSciNet MATH Google Scholar
Hesamian G, Akbari MG (2017) Statistical test based on intuitionistic fuzzy hypotheses. Commun Stat Theory Methods 46(18):9324–9334
Article MathSciNet MATH Google Scholar
Hryniewicz O (2015) Comparison of fuzzy and crisp random variables by Monte Carlo simulations. In: Grzegorzewski P, Ga̧golewski M, Hryniewicz O, Gil MA (eds) Strengthening links between data analysis and soft computing. Springer, Berlin, pp 13–20
Google Scholar
Hryniewicz O, Kaczmarek K, Nowak P (2015) Bayes statistical decisions with random fuzzy data—an application for the Weibull distribution. Eksploat Niezawodn Maint Reliab 17(4):610–616
Article Google Scholar
Hung W-L (2006) Weighted bootstrap method for fuzzy data. Soft Comput 10(2):140–143
Article Google Scholar
Efron B (1982) The jackknife, the bootstrap, and other resampling plans. Society for Industrial and Applied Mathematics, Philadelphia
Book MATH Google Scholar
Kaufman A (1975) Introduction to the theory of fuzzy subsets, vol 1: fundamental theoretical elements. Academic Press, New York
Google Scholar
Körner R (2000) An asymptotic $\alpha $-test for the expectation of random fuzzy variables. J Stat Plan Inference 83(2):331–346
Article MathSciNet MATH Google Scholar
Körner R, Näther W (2002) On the variance of random fuzzy variables. In: Bertoluzza C, Gil MA, Ralescu DA (eds) Statistical modeling, analysis and management of fuzzy data. Physica, Heidelberg, pp 25–42
Chapter Google Scholar
Kruse R, Meyer KD (1987) Statistics with vague data. D. Riedel Publishing Company, Dordrecht
Book MATH Google Scholar
Kwakernaak H (1978) Fuzzy random variables, part I: definitions and theorems. Inf Sci 15(1):1–29
Article MATH Google Scholar
Kwakernaak H (1979) Fuzzy random variables. Part II: algorithms and examples for the discrete case. Inf Sci 17(3):253–278
Article MATH Google Scholar
Montenegro M, Colubi A, Casals MR, Gil MA (2004) Asymptotic and Bootstrap techniques for testing the expected value of a fuzzy random variable. Metrika 59(1):31–49
Article MathSciNet MATH Google Scholar
Nowak P, Hryniewicz O (2018) On central-limit theorems for IV-events. Soft Comput 22(8):2471–2483
Article MATH Google Scholar
Nowak P, Romaniuk M (2013) A fuzzy approach to option pricing in a Levy process setting. Int J Appl Math Comput Sci 23(3):613–622
Article MathSciNet MATH Google Scholar
Nowak P, Romaniuk M (2017) Catastrophe bond pricing for the two-factor Vasicek interest rate model with automatized fuzzy decision making. Soft Comput 21(10):2575–2597
Article MATH Google Scholar
Nowakowska M (1977) Methodological problems of measurement of fuzzy concepts in the social sciences. Behav Sci 22(2):107–115
Article Google Scholar
Puri ML, Ralescu DA (1986) Fuzzy random variables. J Math Anal Appl 114(2):409–422
Article MathSciNet MATH Google Scholar
Ramos-Guajardo AB, Colubi A, González-Rodríguez G, Gil MA (2010) One-sample tests for a generalized Fréchet variance of a fuzzy random variable. Metrika 71(2):185–202
Article MathSciNet MATH Google Scholar
Ramos-Guajardo AB, Lubiano MA (2012) $K$-sample tests for equality of variances of random fuzzy sets. Comput Stat Data Anal 56(4):956–966
Article MathSciNet MATH Google Scholar
Ralescu AL, Ralescu DA (1984) Probability and fuzziness. Inf Sci 34(2):85–92
Article MathSciNet MATH Google Scholar
Robert ChP, Casella G (2004) Monte Carlo statistical methods, 2nd edn. Springer, New York
Book MATH Google Scholar
Romaniuk M (2016) On simulation of maintenance costs for water distribution system with fuzzy parameters. Eksploat Niezawodn Maint Reliab 18(4):514–527
Article MathSciNet Google Scholar
Romaniuk M (2018) Optimization of maintenance costs of a pipeline for a V-shaped hazard rate of malfunction intensities. Eksploat Niezawodn Maint Reliab 20(1):46–56
Article MathSciNet Google Scholar
Tran L, Duckstein L (2002) Comparison of fuzzy numbers using a fuzzy distance measure. Fuzzy Sets Syst 130(3):331–341
Article MathSciNet MATH Google Scholar
Viertl R (2011) Statistical methods for fuzzy data. Wiley, Chichester
Book MATH Google Scholar
Zhu Q, Lee ES (1992) Comparison and ranking of fuzzy numbers. In: Kacprzyk J, Fedrizi M (eds) Fuzzy regression analysis. Omnitech Press, Warsaw and Physica-Verlag, Heidelberg, pp 21–44
Google Scholar
Zwick R, Carlstein E, Budescu DV (1987) Measures of similarity among fuzzy concepts: a comparative analysis. Int J Approx Reason 1(2):221–242
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
Maciej Romaniuk & Olgierd Hryniewicz
The John Paul II Catholic University of Lublin, ul. Konstantynów 1H, 20-708, Lublin, Poland
Maciej Romaniuk

Authors

Maciej Romaniuk
View author publications
You can also search for this author in PubMed Google Scholar
Olgierd Hryniewicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maciej Romaniuk.

Ethics declarations

Conflict of interest

All authors of this paper declare that they have no conflict of interest.

Human participants or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Romaniuk, M., Hryniewicz, O. Interval-based, nonparametric approach for resampling of fuzzy numbers. Soft Comput 23, 5883–5903 (2019). https://doi.org/10.1007/s00500-018-3251-5

Download citation

Published: 24 May 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s00500-018-3251-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Interval-based, nonparametric approach for resampling of fuzzy numbers

Abstract

Similar content being viewed by others

Bootstrapped Kolmogorov-Smirnov Test for Epistemic Fuzzy Data

Bootstrap Methods for Fuzzy Data

Bootstrap Comparison of Statistics for Testing the Homoscedasticity of Random Fuzzy Sets

1 Introduction

2 Mathematical preliminaries

2.1 Fuzzy numbers and random fuzzy numbers

Definition 1

Definition 2

Definition 3

2.2 Measures of similarity

2.3 Tests of the fuzzy mean value

Corollary 1

Corollary 2

3 Generation of the secondary (bootstrap) sample

Example 1

3.1 The d-method based on a discrete distribution d(x)

Example 1

3.2 The w-method based on a mixed discrete uniform distribution w(x)

3.2.1 The w(x) distribution, and its properties

Lemma 1

Proof

Remark 1

3.2.2 Generation procedure

Example 1

4 Properties of bootstrap-like secondary samples

4.1 Small primary sample, type A fuzzy number

4.2 Small primary sample, type B fuzzy number

4.3 Moderate primary sample

5 New bootstrap-like sample as a tool in statistical tests

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human participants or animals

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation