Analysis of variance in uncertain environments

Testing one-way analysis of variance (ANOVA) is used for experimental data analysis in which there is a continuous response variable and a single independent classification variable. In this paper, we extend one-way ANOVA to a case where observed data are imprecise numbers rather than real numbers. Several fast computable formulas are calculated for symmetric triangular and normal fuzzy data. Similar to the classical testing ANOVA, the total observed variation in the response variable is explained as the sum of observed variation due to the effects of the classification variable and the observed variation due to random error. A real case is given to clarify the proposed method.


Introduction and literature review ANOVA
Analysis of variance (ANOVA) is concerned with analyzing variation in the means of several independent populations which developed by Ronald Fisher. The most important point in traditional ANOVA is a test about the significance of the difference among population means. This test permits the user to conclude whether the differences among the means of several populations are too deviated to be attributed to the sampling error or not [7]. The basic formulas of classical ANOVA model can be referred to any statistical book [1,10,19], but the classical ANOVA has been briefly reviewed in "Classical ANOVA" of this manuscript from [21].
In many applied sciences like geology, astrology, economics, medical sciences, engineering and environmental sciences, there exist actual cases where imprecise quantities can be assigned to experimental results. In such practical cases, non-precise numbers are suitable models to formalize and handle such imprecise quantities, which is the justification for require to fuzzy sets in ANOVA. The previous related works on fuzzy ANOVA are listed in the next subsection.

Fuzzy ANOVA
During the past three decades, several scientific works have been published in different fields of ANOVA using fuzzy set theory. These fields are as follow: one-way ANOVA testing procedure based on normal fuzzy random variables (FRVs) [18], asymptotic bootstrap ANOVA test for FRVs by onesample [17], providing ANOVA method for the functional data in Hilbert space [2], asymptotic bootstrap ANOVA test for simple FRVs on the basis of multi-sample [7,8], an optimization method for one-way ANOVA problem by the cuts of FRVs when observations are imprecise [29], investigating on one-way fuzzy ANOVA and comparing it with a regression model [4], processing method for ANOVA for the vague data using the moment correction [15], providing one-way ANOVA for triangular fuzzy numbers on the basis of the extension principle [21], developing one-way ANOVA for the closed interval observations [9], providing R package 'SAFD' for ANOVA test based on defuzzification of the FRVs [16], extending the classical ANOVA test statistics to investigate on fuzzy means of imprecise observation by bootstrap approach [25], providing ANOVA test based on least squares approach for FRVs [12], and proposing a decision-rule approach to test fuzzy hypotheses in ANOVA considering triangular imprecise observation and vague hypotheses [13].
Unlike these studies, another simple method for oneway ANOVA, as a meaningful extension on the classical ANOVA, is presented in this paper where the observations are symmetric triangular or normal fuzzy numbers. The proposed ANOVA method is on the basis of a distance between symmetric fuzzy numbers which is easy to be used by professional clients.
The organization of this paper is as follows. After introducing symmetric triangular and normal fuzzy numbers, some arithmetic operations for imprecise numbers are reviewed in "Preliminaries". In "Classical ANOVA", classical ANOVA is briefly explained from [21]. In "Analysis of variance based on fuzzy observations", a new method for ANOVA test for fuzzy observations is provided and discussed. In "Fast computation formulas", several fast computable formulas are presented for symmetric triangular and normal fuzzy data. A case study on several brands of soap is presented in "A case study on the process of soap production" which is revealing the ideas of this paper. Conclusion part is given in the final section.

Preliminaries
In this section, we review briefly some definitions which will be referred through this paper. Let X be a universal set and is called a fuzzy set on X . In particular, let F(R) be the set of fuzzy numbers on R (the set of real numbers). The α-cut of A is the crisp set given by A α = {x ∈ X |A(x) ≥ α}, for anyα ∈ (0, 1]. and it is denoted with T (a, s a ).
(ii) Another especial case of fuzzy numbers called normal fuzzy number (NFN) is defined by and it is shown with N (a, s a ). Here, a ∈ R and s a ∈ R + are named the core/mean and the spread of STFNs and NFNs, respectively. Moreover, F T (R) and F N (R) denote the sets of all STFNs and NFNs on R [23].
The following distance will be used through Sect. 4 to solve the problem of testing ANOVA based on fuzzy observations.
Hence, using g(α) = m+1 2 α m , m = 1, 2, 3, . . . in Definition 2.2, it can be concluded that Note that in some cases the integral used in Definition 2.2 leads to an improper integral as appeared in the proof of Theorem 2.2.

Classical ANOVA
Although the classical ANOVA model can be referred to many statistical books as well as well-known references, e.g., see [1,10,19], but for a short introduction and review the methodology of ANOVA we propose "Classical ANOVA" from [21] since the considered notations are same here. So, we only present some classical formulas in Table 1 known as ANOVA table. To test whether the factor level means μ i 's are the same in quantity or not, the classical ANOVA has been contemplate to accept/reject hypothesis "H 0 : μ 1 = μ 2 = · · · = μ r ", against "H 1 : not all μ i 's are equal", based on observed random samples [11].

Analysis of variance based on fuzzy observations
Quantities of continuous variables are not precise numbers and a more appropriate way to describing them is using imprecise/fuzzy numbers [24,26,27]. To explain such situation, one can see several applied examples from references [11,14,28]. It must be noted that in this section, only the observed amounts of precise random variables can be regarded as imprecise numbers while the model for observed values is still precise as follows Note that a similar idea is used in [6,11,27] and in a such situation, observations and recorded data can be looked as wherẽ y i j is interpreted as "approximately y i j ", for i = 1, . . . , r and j = 1, . . . , n i . Considering above discussion, in this section and hereafter, it is assumed that we are concerned with a classical analysis of variance, where the whole components of the problem such as random variables, hypothesis and parameters of populations are crisp as are introduced in "Classical and so Y i j s are ordinary random variables. But, just one point that will departed from classical ANOVA assumptions in the model (7) is that the sampled observations are fuzzy numbers rather than being real numbers and nothing else is altered in the prior testing ANOVA.
Regarding the introduced distance between two fuzzy numbers in Definition 2.2, the observed values of the statistics SST, SSTR, SSE, MSTR, MSE and F can be calculated by the following extended formulas based on fuzzy observations: and In testing ANOVA based on fuzzy numbers, the observed amount of the test statistic is precisely computable bỹ in which mstr = sstr r −1 and mse = sse n t −r are respectively treatment mean square and the mean of squares.

Remark 4.1
The major problem in testing ANOVA is calculating the sum of squares of fuzzy observations. In this paper, we have used a logical defuzzification in Eqs. (8)(9)(10), since: (i) Although the square of a LR fuzzy number can be computed by extension principle, the result is not necessarily in the class of LR fuzzy numbers too [23]. Hence, one is faced with calculating the membership function of the addition of fuzzy numbers, with not necessarily the same shape functions, based on the Zadeh's extension principle. (ii) Calculation of sst, sstr and sse by Zadeh's extension principle, will lead us to a very vague observed statistic at the end of computations, since usually the experimenter is faced to r i=1 n i fuzzy numbers in any testing ANOVA. In other words, as r i=1 n i increases, then the support of sst, sstr and sse become larger and this means the vagueness of the observed fisher statistic. For instance, see the observed fisher statistic in Sect. 8.2 of [21] which is obtained via Zadeh's extension principle approach. (iii) Following (ii), the truth level of decision making can be low in many applied situations. See definition "the fuzziness of the decision" from [11].
The decision rule Letf be the observed value of the test statistic and F 1−α;r −1,n t −r be the αth quantile of the fisher distribution with r − 1 and n t − r degrees of freedom. At the given significance level α, H 0 is accepted if f ≤ F 1−α;r −1,n t −r ; otherwise H 1 is accepted. The rejection region in testing ANOVA is F > F 1−α;r −1,n t −r and so under hypothesis H 0 the error probability is where F 1−α;r −1,n t −r is αth quantile of F distribution. Therefore, in testing ANOVA based on fuzzy numbers, the p-value can be calculated by p-value = P F >f in whichf is introduced by (11) as the observed amount of test statistic on the basis of fuzzy observations.

Fast computation formulas ANOVA for symmetric triangular fuzzy observations
In this subsection, we assume all observations are STFNs. Then the observed values of statistics are obtained by the following theorems in testing ANOVA on the basis of STFNs.
Theorem 5.1 Based on symmetric triangular fuzzy observationsỹ i j = T (y i j , s y i j ) ∈ F T (R), i = 1, 2, . . . , r, j = 1, 2, . . . , n i , the observed values of sst, sstr and sse in testing ANOVA are the following real numbers: sstr s y (13) and where sst y =  (m + 2) (m + 3) mstr y + 2mstr s y (m + 2) (m + 3) mse y + 2mse s y , (15) in which mstr y = sstr y r −1 and mse y = sse y n t −r are the observed mean squares of the mean values ofỹ i j 's, and also, mstr s y = sstr sy r −1 and mse s y = sse sy n t −r are the observed mean squares of spreads ofỹ i j 's.
Proof By Eqs. (13 and 14), mean squares can be obtained as mstr s y (16) and Therefore, Eq. (15) can be easily followed from f = mstr mse .

ANOVA for normal fuzzy observations
In this subsection, we assume all observations are NFNs in testing ANOVA. Then the observed values of statistics in ANOVA model can be easily obtained by the following theorems.
sstr = sstr y + 1 m + 1 sstr s y (20) and where sst y , sstr y and sse y are the observed sum of squares for the mean values ofỹ i j s, and also sst s y , sstr s y and sse s y are the observed sum of squares for spreads ofỹ i j s.

Remark 5.1
As a result of Eq. (19) for ANOVA based on normal fuzzy observations, not only increasing sst y can be caused the increase of sst, but also increasing sst s y can be caused the increase of sst. Similar results can be presented on the basis of formulas (19) and (20) in ANOVA based on normal fuzzy numbers, and also on the basis of formulas (12)- (14) for ANOVA based on symmetric triangular fuzzy numbers.

Theorem 5.4 Under the same assumption of Theorem 5.3, the observed value of ANOVA test statistic is
in which mstr y and mse y are the observed mean squares of the mean values ofỹ i j s, and also, mstr s y and mse s y are the observed mean squares of spreads ofỹ i j 's.

Remark 5.2
In ANOVA test based on STFNs and NFNs, all the introduced extended statistics in Eqs. (12)-(24) reduce to presented statistics in Sect. 3 of [21] for classical ANOVA, where spreads ofỹ i j s are a fixed number for i = 1, . . . , r and j = 1, . . . , n i .

Remark 5.3
Testing ANOVA discussed here is different from Wu's approach [28]. His ANOVA method constructed on the basis of the cuts of FRVs, optimistic and pessimistic degrees by optimization. Also, the presented work in [21] is based on extension principle which can lead user to a very fuzzy decision by a very vague fisher statistic in practice. The major advantages of the presented method in this paper are efficiency for large sampling data and its simplicity in use by clients in the real situations.
In the next section, the presented ANOVA method will be exemplified for fuzzy observations by a relevant study.

A case study on the process of soap production
The aim of this study is to investigate on the amount of solubility for three different kinds of soaps in tepid water. The treatment factor, soap, has been selected to have three levels: regular, deodorant, and moisturizing brands, from the same manufactory. This example is extracted from an experiment done by Suyapa Silvia [3] where the data are fuzzified. It must be stressed that the condition of the presented ANOVA test/problem in this paper is exactly same as the presented ANOVA test in [21], but their solutions are different. Therefore, we do not present the story of experiment process to shorten the length of paper and we refer the readers to Sect. 8.2 of [21] for more details about gathering data. Now, we are going to respond to this question: Is there a significant difference in the weight loss (between three kinds of soaps) because of dissolution in water when allowed to soak for the same time?
There are some unavoidable elements in the experiment reported by Suyapa Silvia [3], such as: (1) inability of the experimenter to cut exactly the same cubes with same weights, (2) limitation of digital laboratory scales of precision of 10 mg, and etc. Hence, a better plan to record the observations is to use STFNs where some of them are definitely non-positive. Therefore, in this real case, we rewrite the data by STFNs as in Table 2, to protect the unavoidable elements described in above. The spread of each observed symmetric triangular fuzzy number is considered as a function of its mean in Table 2. In other words fuzzy observations have been considered by STFNsỹ i j = T (y i j , s y i j ), such that s y i j = y i j /10, for i = 1, 2, . . . , r and j = 1, 2, . . . , n i . Note that in this study some observations are not reported by positive STFNs.
For the presented triangular fuzzy observations in Table 2, we wish to test whether the weight loss of three soaps are the same or not. In other words, we are going to test "H 0 : μ 1 = μ 2 = μ 3 ", versus "H 1 :not allμ i 's are equal, for i =1, 2, 3".
Considering Theorems 5.1 and 5.2, one can calculate the observed values of ANOVA statistics which are reported in Table 3, based on the given STFNs. For instance, the total sum of squares is calculated for m = 1 by Theorem 5.1 as follows By comparing the computed ANOVA test statistic, one can accept the alternative hypothesis H 1 at significance level 0.05. The critical value of ANOVA test is F 1−α;r −1,n t −r = F 0.95;2,9 = 4.256 and also the computed p-value= 18 × 10 −5 shows accuracy of H 1 strongly. Therefore, we conclude that there is a relation between soaps and weight loss based on recorded vague data in Table 2. One can see the solution of this applied case study, but by extension principle, in Sect. 8.2 of [21], which leads the decision maker to a very vague observed fisher statistic, and as presented in Remark 4.1, this matter can be caused lowering the truth level of decision.

Conclusions and future works
In most applied sciences, there are several situations for which non-precise values can be assigned to their experiments. Fuzzy/non-precise numbers are suitable models to formalize the observed values in such situations and the observed data could be presented by the notion of fuzzy sets to analyze the experiment. The main contribution of this article is extending ANOVA test for fuzzy observations based on a distance between symmetric fuzzy numbers. Several fast computing formulas obtained where the observations are STFNs or NFNs. When all observations are real numbers, the presented ANOVA method reduces to the classical ANOVA approach, since the vagueness of the observed statistics is removed and what remained are only the central points of the statistics, as one can see in Theorems 5.1, 5.3 and Remark 4.2. The proposed ANOVA approach is a meaningful generalization for the classical ANOVA. Efficiency in large sampling is the major advantage of this method, and also this method is easy to be used by professional clients who are familiar with ANOVA, which are some of contributions of the presented manuscript.
For future research work, one can try to use the same approach of this paper to extend other experimental designs such as random block design, Latin square design, etc., where the observations are fuzzy numbers rather than being real numbers. Another interesting topic for future works is to extend the results of this paper for trapezoidal fuzzy numbers and non-symmetric fuzzy numbers, or in general for LR fuzzy numbers.