Statistical inference on quantiles of two independent populations under uncertainty

Statistical inference is the process of drawing conclusions about underlying population(s) using sample data to either confirm or falsify hypotheses. However, the complexity of real-life problems often makes the underlying statistical models inadequate, as information is often imprecise in many respects. To address this common problem, some papers have been published on modifications and extensions of test concepts by employing tools of fuzzy statistics. In this paper, we present a non-parametric test for the difference between quantiles of two independent populations based on fuzzy random variables. For this purpose, we consider the fuzzy quantile function and its estimation based on α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-values of fuzzy random variables. We then provide a fuzzy test based on the fuzzy empirical distribution function for the difference of fuzzy order statistics from these independent populations. We also suggest a specific degree-based criterion to compare the fuzzy test statistics at a specific significance level to decide whether the underlying fuzzy null hypothesis can be rejected or not. The effectiveness of the proposed two-sample test on quantiles is investigated via numerical examples.


Introduction
Hypothesis testing regarding an assumption about the probability distribution of one random variable or a set of random variables is a main field of statistical inference.Those tests demand a well-defined modeling of the tested hypothesis, i.e., precisely stated requirements in relation to the distribution of the underlying random variable(s) (Chukhrova and Johannssen 2020a).In many practical applications, it is necessary to compare two independent populations concerning their central tendencies or other distribution-related criteria (O'Gorman 2004;Shi and Tao 2008;Taff 2018).With the aim to conclude whether the difference of interest between both these populations is significant or not, techniques of statistical inference are employed.Considering two-sample hypothesis tests, the populations are usually compared in terms of their central tendency measures (like the means) to conclude if there are significant differences in these populations.Although comparing two populations concerning their means is a common problem, there are situations where one needs to compare other characteristics of the populations such as quantiles (see, e.g., Hutson 2009;Viertl 2006;Kosorok 1999).When using quantiles (e.g., quartiles) instead of the means, the test decision is based on robust location parameters and outliers have no impact on the test decision (Chukhrova and Johannssen 2021b).
Further, censoring or truncation can complicate estimation of entire distribution functions and an examination of a collection of quantiles is a reasonable alternative (Gözde and Özdemir 2018).In addition, in many real-life applications (like psychology, biology, medicine, economics), the quantiles of the underlying characteristic variables are important boundaries for decision-making (Wang and Hettmansperger 1990;Farrell et al. 1997).For instance, the determination of the differences in the tails (by employing quantiles) is often of interest in such cases.But, the main advantage of non-parametric testing on quantiles compared to common tests such as Student's t-test is that there is no need for additional assumptions related to homoscedasticity or normality of two population distributions, which are often not fulfilled in practical applications (for instance, when the distributions are asymmetric or irregularly shaped).That is, the results of Student's t-test can be misleading when the assumptions are not satisfied.For these reasons, several non-parametric twosample quantile-based tests have been developed (Heinzl and Mittlboeck 2017;Hutson 2009).
However, in situations, where point-or interval-valued formulations of hypotheses appear too rigid for real-life problems, the above limitations make the practitioner do decision procedures in a non-reasonable way.In this case, common statistical inference techniques are inappropriate for testing a hypothesis.Moreover, there are many situations in practical applications where the observations cannot be measured as crisp quantities, because information is often imprecise, incomplete, linguistic, noisy, qualitative, or vague (Chukhrova and Johannssen 2019).In contrast, fuzzy hypothesis testing provides a more realistic framework for such hypothesis testing problems, as fuzzy set theory is a natural tool for modeling and analyzing subjective and imprecise concepts.A fuzzy hypothesis allows for a more appropriate treatment of the unknown parameter(s), i.e., instead of specifying its (their) hypothesized values over a crisp interval, it allows, for example, specifying a smooth transition from "preferred" to "non-preferred" or from "possible" to "impossible" values in terms of an appropriate modeled membership function (Chukhrova and Johannssen 2020a).Furthermore, hypothesis testing in fuzzy environments facilitates to add available expert knowledge to the test procedure, taking into consideration the economic context or possibilistic aspects.Therefore, fuzzy modeling approaches provide appropriate techniques for dealing with those various types of uncertain information (Chukhrova and Johannssen 2021a).
Considering previous fuzzy non-parametric tests for the two-sample case, they essentially rely on comparing fuzzy medians.As comparing any quantiles of two populations is an important issue, it is necessary to develop a methodology to compare quantiles of two populations based on fuzzy data.In this study, therefore, we introduce a new idea of non-parametric testing for comparing fuzzy quantiles of two independent populations based on fuzzy random variables.Since the observed data are fuzzy quantities, it is a natural step to consider the components of the population such as distribution function, order statistics and quantile functions as fuzzy quantities as well.In this regard, we extend the concept of the fuzzy quantile function of fuzzy random variables and their empirical estimation based on fuzzy data.We also construct the respective hypotheses to compare fuzzy quantiles of the populations via ranking criteria and introduce a test statistic that employs the α-cuts of fuzzy numbers.Then, a procedure for constructing a fuzzy test function to reject or not reject the underlying null hypothesis related to the comparison of fuzzy quantiles is presented.Therefore, besides testing on the medians of both populations, the introduced method can be applied for any fuzzy quantile functions of two populations related to, e.g., percentiles or deciles.For practical reasons, the proposed method is illustrated via some application examples.
The rest of this paper is organized as follows.Section 2 reviews essential concepts related to fuzzy numbers and fuzzy random variables.Section 3 introduces the notion of fuzzy quantiles and discusses their empirical estimation considering fuzzy random variables.In Sect.4, the nonparametric hypothesis test for comparing fuzzy quantiles of two independent populations is developed.In Sect.5, practical applications of the proposed test are illustrated.Finally, conclusions are provided in Sect.6.

Preliminaries
This section reviews some necessary basic definitions of fuzzy numbers and fuzzy random variables.

Fuzzy sets and fuzzy numbers
A fuzzy set A on the real line R is defined by the membership function μ As for the practical handling of FNs, they are often modeled via a functional parametric form called L R-FN A = (a; l a , r a ) L R .The membership function of an L R-FN A is given by where l a > 0 is the left spread, r a > 0 is the right spread, and L and R are reference functions defining the left and the right shapes of the FN, respectively, where L, R : [0, 1] → [0, 1] should satisfy the following conditions: The set of all L R-FNs is represented by F(R).Furthermore, the most commonly used (unimodal) L R-FNs (with L(x) = R(x) = max{0, 1 − x}) are triangular FNs (TFNs).The membership function of a TFN, denoted by A = (a; l a , r a ) T , is given by: Remark 1 (Hesamian and Shams 2016) For a given A ∈ F(R), the mapping where A L [α] and A U [α] denote the lower and upper limits of α-cuts of A, respectively.Then, it follows: For instance, α-cuts of an L R-fuzzy number A = (a; l a , r a ) L R can be calculated as: Remark 2 (Hesamian et al. 2019) Note that for all A, B ∈ F(R), λ ∈ R and α ∈ [0, 1], the following arithmetic operations on fuzzy numbers can be defined: where ⊕ and ⊗ denote common arithmetic operators of fuzzy numbers.
Definition 1 (Yuan 1991) For two FNs A and B ∈ F(R), let The preference degree " A is larger than B" is defined by: Definition 2 For two FNs A and B, it holds that: The preference criterion P d meets the following properties: Proposition 1 Let A, B, C be three FNs in F(R).Then, it holds: ( Proof See Yuan (1991).
Definition 3 (Hesamian and Akbari 2018) The absolute error distance between two FNs A and B is defined as follows: The TFNs A, B, C satisfy the following conditions: (1)

Fuzzy random variables
In the following, we briefly give common definitions of fuzzy random variables, fuzzy cumulative distribution function and its estimator.
Definition 5 Two FRVs X and Y are called identically distributed and independent, if X α and Y α are identically distributed and independent for all α ∈ [0, 1].Similarly, it can be said that X 1 , . . ., X n is a fuzzy random sample (FRS) of size n if all X i are independent and identically distributed FRVs.An observed fuzzy random sample can denoted by x 1 , . . ., x n .
Definition 6 (Hesamian et al. 2019) Let X be a FRV and { X n } ∞ n=1 a collection of FRVs defined on the same probability space.Then, X n converges almost surely to X , denoted by X n a.s.→ X .For every ε > 0, it holds Definition 7 Let X 1 , . . ., X m be a FRS of X .The j th order statistic of X 1 , . . ., X m is defined to be a FN with α-cuts ( X ( j) ) α = ( X α ) ( j) .
Definition 8 (Hesamian and Chachi 2015) The fuzzy number F X (x) is said to be a fuzzy cumulative distribution function (FCDF) of X , if its α-cuts are defined by ( F X (x)) α = P( X 1−α ≤ x).
Definition 9 (Hesamian and Chachi 2015) Let x 1 , . . ., x n be a FRS of X .The fuzzy number F n (x) is said to be a fuzzy empirical cumulative distribution function, if its α-cuts are defined by Lemma 2 Suppose that X 1 , . . ., X n is a fuzzy random sample with FCDF F X (x).Then, Proof See Hesamian and Chachi (2015).

Fuzzy quantile function
In this section, the notions of fuzzy quantile function and fuzzy empirical quantile are introduced and discussed.
Definition 10 Let X be a FRV.The fuzzy quantile function (FQF) of X at level τ is defined by a FN with the following α-cuts: Example 2 Let X be a (normal) FRV (Puri and Ralescu 1985) with X = μ ⊕ , where , where Z τ denotes the τ th quantile of the standard normal distribution.Therefore, the FQF of X can be evaluated by Q X (τ ) = μ ⊕ Z τ σ .
Definition 11 Let x 1 , . . ., x m be a FRS of X .The fuzzy empirical quantile function (FEQF) of X at level τ is defined by a FN with the following α-cuts: , where [k] represents the integer part of k.
Example 3 Consider the data set given in Table 1 with From Definition 9, first note that where Therefore, at quantile level τ ∈ (0, 1), the FEQF of X is given as follows: Table 2 shows the lower and upper bounds of Q n (τ ) [α] for τ = 0.25, 0.50 and 0.75 and some values of α.The plots of Q n (0.25), Q n (0.5) and Q n (0.75) are presented in Fig. 1.
Lemma 4 Suppose that X 1 , . . ., X n is a fuzzy random sample with FQF Q X (τ ).Then, Q n (τ ) Proof As holds for every α ∈ [0, 1], it follows that is satisfied for every ε > 0, which completes the proof.

Hypothesis test for comparing fuzzy quantiles of two populations
Let X 1 , . . ., X m and Y 1 , . . ., Y n be random samples from two independent populations with absolutely continuous distribution functions F X and G Y , respectively.Also, let X (1) , . . ., X (m) and Y (1) , . . ., Y (n) be the corresponding order statistics.The null hypothesis of interest is , where τ and ν are two quantile levels.A test statistic for the difference of the empirical quantile functions ) from different populations can be defined by Heinzl and Mittlboeck (2017); Hutson ( 2009)  where Now, let X1 , . . ., Xm and Ỹ1 , . . ., Ỹn be two independent FRSs from two populations with FCDFs F X and F Y .In the following, a procedure is established for comparing fuzzy quantiles of two populations.For this purpose, consider the following fuzzy hypotheses concerning quantiles of two populations: Definition 12 Let X and Y be two FRVs.The hypotheses of interest are defined as For testing the above hypotheses, we employ the following test statistic.
Definition 13 Let X1 , . . ., Xm and Ỹ1 , . . ., Ỹn be two independent FRS from two FCDFs F X and F Y .The α-cuts of the fuzzy test statistic are defined by where The test decision on rejecting or non-rejecting H 0 can be made as follows: Definition 14 Let us consider the problem of testing the fuzzy hypothesis H 0 versus H A 1 or H B 1 based on two independent FRS x1 , . . ., xm and ỹ1 , . . ., ỹn .Then, at significance level δ, the fuzzy test is defined as a fuzzy set: 1.As for testing H 0 versus H A 1 , we use , Here, "Accept" and "Reject" stand for non-rejection and rejection of H 0 , respectively.Since ϕ δ (Reject) + ϕ δ (Accept) = 1, at significance level δ, one cannot reject the null hypothesis if ϕ δ (Accept) > ϕ δ (Reject) or ϕ δ (Accept) ≥ 0.5.
Remark 3 Since the decision to reject or non-reject H 0 versus H A 1 or H B 1 is made via a fuzzy test, this motivates to defuzzify the decision in order to get an exact decision similar to classical statistical hypothesis testing.For this purpose, note that As for the interpretation of the test decision, it can be done similar to the classical approach as follows: , then H 0 is rejected; otherwise, it cannot be rejected Remark 4 As a special case of the proposed method, it can be employed for comparing the fuzzy medians of two populations.In this regard, Grzegorzewski (2005) and Grzegorzewski ( 2009) introduced a fuzzy test for comparing two and k (crisp) population medians based on fuzzy random variables, respectively.He developed a fuzzy test statistic by employing fuzzy random variables and proposed a fuzzy test based on the necessity ranking criterion.However, the method presented in this paper follows a different strategy for comparing fuzzy quantiles of two populations based on fuzzy random variables.The proposed fuzzy quantile technique includes the following procedure: 1. Extending the quantile of a FRV as a FN 2. Extending the empirical estimator of a fuzzy quantile based on a FRS 3. Investigating the relationship between a fuzzy quantile and its corresponding estimator for large sample cases Then, a non-parametric statistical hypothesis test was developed for comparing any fuzzy quantiles of two populations based on two independent fuzzy random samples.

Numerical examples
In this section, the feasibility and effectiveness of the proposed non-parametric two-sample test on quantiles are examined via numerical examples.Note that there is no method for a reasonable comparison, as other two-sample fuzzy tests rely on comparing the means or medians (or variances) of two populations.
Example 4 A random sample of 30 identical twins underwent psychological tests to measure their aggressiveness.We are interested in comparing the twins to see if the firstborn twin tends to be more aggressive than the other one.Assume that, due to limitations in psychological measurements, the results of the evaluations are reported as TFNs for the first born as (x; 0.02x) L and the second born as (y; 0.02y) L with L(x) = The fuzzy test statistic can be determined based on Definition 13.The lower and upper bounds of T 30,30 are shown in Table 4 based on various values of α ∈ (0, 1).The plot of T 30,30 is shown in Fig. 2. According to Definition 14, the fuzzy test can be performed via , where ϕ A 0.05 (Reject) = P d ( T 30,30 0.975) = 0.72.Therefore, at significance level 0.05, the fuzzy null hypothesis is rejected with a degree of 0.72 and non-rejected with a degree of 0.28.As for the defuzzification of this test decision, note that it holds M T 30,30 = 0.984 > 1 − δ/2 = 0.975.Following this approach, the final decision is to reject the null hypothesis at level δ = 0.05.Example 5 Let us consider the rocket-motor experiment data set based on Weerahandi and Johnson (1992).It is of interest to make inference on the reliability of the rocket motor at the highest operating temperature of 59 o C. At this temperature, the distribution of the operating pressure Y tends to be closest to the distribution of the chamber burst strength X .It is assumed that the observations can be reported as TFNs via x i = (x i ; 0.05x i ) T and y i = (y i ; 0.1y i ) T , where the observations x i and y i are given in Table 5.At significance level δ = 0.05, we test the following pair of hypotheses: According to Definition 13, the plot of T 17,23 is shown in Fig. 3. , where ϕ B 0.05 (Reject) = P d ( T 17,23 ≺ 0.025) = 0.21 and ϕ B 0.05 (Accept) = 0.79.Thus, at significance level 0.05, the fuzzy null hypothesis is not rejected with a degree of 0.79 and rejected with a degree of 0.21.Furthermore, the defuzzified value related to T 17,23 (M T 17,23 = 0.057) is larger than δ/2 = 0.025, so the decision will be to non-reject the null hypothesis at level δ = 0.05.

Conclusions
In this article, an inferential procedure was developed for comparing fuzzy quantiles of two independent populations.For this purpose, the notion of the fuzzy quantile of a fuzzy random variable was introduced.Then, an estimator of the proposed fuzzy quantile function was developed according to a fuzzy random sample.The estimation procedure was illustrated based on some numerical examples.Further, the large sample property of the proposed fuzzy empirical quantile function was analyzed based on an absolute error distance for fuzzy numbers.In addition, the concept of the fuzzy test statistic was introduced based on fuzzy order statistics of two fuzzy random samples.To test the fuzzy hypotheses on quantiles of two populations, the obtained fuzzy test statistic and the crisp significance level were compared using an criterion called preference degree.As the proposed fuzzy test leads to a degree of rejection or non-rejection of the underlying fuzzy null hypothesis, we also proposed an approach to defuzzify the fuzzy test decision in order to reach a crisp test decision that is important for practical usage.
The results of the practical applications indicate that the proposed method is effective for comparing fuzzy quantiles of two independent populations.One of the advantages of the proposed method is that it can be applied to all kind of fuzzy numbers.As for potential future investigations, employing the proposed method to other types of imprecision such as intuitionistic fuzzy data and/or intuitionistic fuzzy hypotheses could be a promising direction.As another idea for future research, the presented methodology could also be extended to the case when more than two populations need to be compared in terms of their quantiles.

Fig. 2
Fig. 2 The observed fuzzy test statistics in Example 4

Fig. 3
Fig. 3 The observed fuzzy test statistics in Example 5

Table 1
Data set in Example ProofThe claim is immediately verified via

Table 3
Data set in Example 4

Table 5
Data set in Example 5