Introduction

Hypothesis testing is a scientific process used to investigate the acceptance or rejection of a proposition under consideration. Two approaches are used in statistics to verify a hypothesis: 1) Parametric approach 2) Nonparametric approach. The most important aspect of the parametric approach is the satisfaction of the assumption about data’s normality, and a few tests require the equality of population variances [1]. In most situations, the distributional assumption under a parametric test hardly satisfy, and the use of nonparametric or distribution-free tests is a common practice [2, 3]. However, all such nonparametric tests apply to data containing determined observations. In real life, there are various scenarios where we have non-precise data, and in such cases, the existing hypothesis testing approach based on classical test statistics cannot be implemented. Recent studies have suggested nonparametric tests based on interval-valued data and fuzzy logic [4]. Smarandache [5] generalized the fuzzy logic in the neutrosophic sense by considering the interval-valued data and the measure of indeterminacy or falseness. Smarandache introduced Neutrosophic Statistics as a generalization of classical statistics applied when the data under consideration is in neutrosophic numbers [6]. Smarandache and Khalid [6] verified the efficiency of neutrosophic logic. Several authors have implemented neutrosophic logic for data containing uncertainty and vagueness; see ref [7,8,9,10,11].

Furthermore, several authors have developed statistical tests to analyze fuzzy data; see, for example, refs [12,13,14,15]. Also, in fuzzy logic and neutrosophic statistics, several research works have been contributed by introducing decision-making analysis for the data set containing uncertainty and vagueness [16,17,18]. Recently, Aslam introduced different statistical tests using Neutrosophic Statistics, including the tests of homogeneity of variance for uncertainty environment, the goodness of fit test in the presence of uncertain parameters, and the Kolmogorov-Smirnov tests under uncertainty [19,20,21].

In 1952, Kruskal and Wallis [12] provided a robust rank-based test for the k sample problem as an alternative to the parametric approaches, such as the one-way analysis of variance (ANOVA). Kruskal-Wallis H test has been used for analysis purposes in various manners; for example, see refs [13,14,15,16,17,18]. In the classical k sample problem, data are determined and do not contain any ambiguity or vagueness. However, in many current scientific studies, the observations are not necessarily relentless, and indeterminate parts quantitatively express the uncertainties in a sample. The existing Kruskal-Wallis H test cannot be used to investigate the data which is measured in the neutrosophic intervals. A detailed literature review has given a shred of clear evidence that no such test is available that can be useful as a nonparametric alternate for ANOVA under an indeterminate environment. The unavailability of a method for the said purpose is a source of motivation for the current research. The goal is to develop a test that compares several sample observations or group(s); the proposed test is easy to apply and understandable. The proposed modified Kruskal-Wallis test results in the interval-valued form and is preferable for data containing vagueness and uncertainty. The objectives of this article are (1) to introduce the modified neutrosophic Kruskal Wallis test; (2) to define the methodology of the neutrosophic Kruskal Wallis test; and (3) to compare the performance of the existing Kruskal Wallis test with the proposed test through an application on Covid-19 data set under Neutrosophy. More information about the application of neutrosophic statistics can be seen in [22,23,24].

The article is planned as follows. Section 2 presents the computational method for the application of the neutrosophic Kruskal Wallis test. In section 3, the modified Kruskal Wallis test has been demonstrated with an eloquent example of the Covid-19 data set for scrutinizing its efficiency and competence. It is anticipated that the modified nonparametric Kruskal Wallis test will proficiently analyze the data in the presence of uncertainty and vagueness as compared to the existing Kruskal Wallis test under classical statistics. Finally, the results are discussed and generalized with some conclusive remarks.

Computational method of the modified Kruskal Wallis test under uncertainty

In Classical Statistics, nonparametric tests are methods of statistical analysis that do not require a distribution to meet the assumptions necessary to be analyzed. These tests apply to non-normal data sets. Due to this reason, they are sometimes referred to as distribution-free tests. The basic purpose of suggesting the Kruskal Wallis test is to scrutinize that all independent samples containing neutrosophic observations come from neutrosophic populations with equal means implying that the populations under uncertainty are identical. The proposed nonparametric test is applicable for data where the measure of uncertainty or the measure of falseness has been recorded. Suppose XN = aN + bNIN; XN ∈ [XL, XU] is a neutrosophic number where the first part represents the measure of determinacy and the second part represents the measure of vagueness or uncertainty. For IN ∈ [IL, IU] = 0, the neutrosophic number reduces to a random variable under classical statistics. The neutrosophic variable XN represents the neutrosophic sample obtained from the population containing imprecise, uncertain, and indeterminate observations; for detail, see ref [5].

Modified Kruskal Wallis H test

Under Classical Statistics, the Kruskal-Wallis H test is used to test the null hypothesis that all k independent samples come from populations having equal means against the alternative hypothesis that at least one population varies. The existing nonparametric test is a generalization of the two-sample Mann-Whitney U test. It is an extremely useful test when the assumptions of normality do not hold, or the population variances are not equal, but helpless for data under uncertainty. The modified Kruskal Wallis test under uncertainty will be applicable under the following assumptions:

  1. 1.

    The data consists of uncertain, imprecise, and indeterminate values.

  2. 2.

    The neutrosophic samples must be random.

  3. 3.

    The two neutrosophic samples must be mutually independent.

  4. 4.

    The test is generally considered robust to ties, but if there are ties in the data set, they shouldn’t be concentrated together in one part of the sample.

Suppose we have kN independent neutrosophic samples of sizes n1N, n2N, …, nkN (∑niN = nN). Let XiN (Xi1N, Xi2N, Xi3N, …, XinN) represents the neutrosophic observations of the ith sample. To perform this test under uncertainty, arrange all the nN observations containing uncertainty of the kN samples combined in ascending order of magnitude and assign the ranks to them. In the case of ties, assign the average of the ranks. To distinguish the neutrosophic sample observations, let the letters AN, BN, CN, … represent the sample observations of the first, second, and third neutrosophic samples, respectively. The observations of the neutrosophic samples are replaced with their corresponding ranks. Add these ranks for each sample and denote the sums by R1N, R2N, …, Rnk. Now compute

$${S}_{kN}^2=\sum_{i=1}^{k_N}\frac{R_{iN}^2}{n_{iN}};{R}_N\in \left[{R}_L,{R}_U\right];{n}_N\in \left[{n}_L,{n}_U\right];{k}_N\in \left[{k}_L,{k}_U\right]$$
(1)

and

$${S}_{rN}^2=\sum_{i,j}{r}_{ijN}^2;$$
(2)

where rijN is the rank assigned to neutrosophic observation XijN; XijN ∈ [XijL, XijU].If there are no ties, then

$${S}_{rN}^2=\frac{n_N\left({n}_N+1\right)\left(2{n}_N+1\right)}{6}$$
(3)

The modified Kruskal-Wallis statistic HN; HN ∈ [HL, HU] is given by

$${H}_N=\frac{\left({n}_N-1\right)\left({S}_{kN}^2-{C}_N\right)}{\left({S}_{rN}^2-{C}_N\right)}$$
(4)

where CN; CN ∈ [CL, CU] denotes the appropriate correction term given by

$${C}_N=\frac{n_N{\left({n}_N+1\right)}^2}{4}$$
(5)

In case of no ties, the neutrosophic statistic HN becomes

$${H}_N=\frac{12{S}_{kN}^2}{n_N\left({n}_N+1\right)}-3\left({n}_N+1\right)$$
(6)

The neutrosophi form of the proposed test HN ∈ [HL, HU] can be expressed as follows

$${H}_N={H}_L+{H}_U{I}_{NH};{I}_{NH}\in \left[{I}_{LH},{I}_{UH}\right]$$
(7)

Note here that the proposed statistic HN ∈ [HL, HU] is a generalization of the existing test under classical statistics. The first part HL shows the determined part, HUINH denoted the indeterminate part and INH ∈ [ILH, IUH] is the measure of indeterminancy/uncertainty. The proposed test reduces to the existing test when ILH =0.

The neutrosophic Kruskal Wallis HN test is used to test the null hypothesis that all kN populations have identical distributions. For a large value of the test statistic under uncertainty is rejected. For example, only three samples have five or fewer neutrosophic observations; the significance of this test statistic is determined by using Kruskal and Wallis’ Table [19] having critical values for all combinations of sample sizes up to 5,5,5. In case one of the neutrosophic samples contains more than five observations, or there are more than five observations in each neutrosophic sample and the null hypothesis is true, the neutrosophic test statistic HN follows a chi-square distribution with (k-1) degrees of freedom.

Application of the proposed modified Kruskal Wallis H test

For applying the proposed neutrosophic Kruskal Wallis test, data representing the daily ICU occupancy by Corona-positive patients have been considered, which was recorded specifically from Pakistan. The hypothesis under investigation for the research is to test a statistically significant the difference in the daily ICU occupancy of Covid-19 patients based on their age groups. Neutrosophy or uncertainty is introduced in the data for a better illustration. The neutrosophic Kruskal Wallis test is applied to test the null hypothesis that there is no difference in the daily ICU occupancy of Covid-19 patients among different age groups in Pakistan during December 2020. Daily ICU occupancy of Covid-19 patients aged 55 and above are shown in Fig. 1, Daily ICU occupancy of Covid-19 patients aged 35–55 are shown in Fig. 2 and Daily ICU occupancy of Covid-19 patients aged 35 and below are shown in Fig. 3.

Fig. 1
figure 1

Daily ICU occupancy of Covid-19 patients aged 55 and above

Fig. 2
figure 2

Daily ICU occupancy of Covid-19 patients aged 35–55

Fig. 3
figure 3

Daily ICU occupancy of Covid-19 patients aged 35 and below

The neutrosophic null and alternate hypotheses for neutrosophic data given in Table 1 are: The average daily ICU occupancy of Covid-19 patients for all three age groups are equal against the alternative hypothesis that the average daily ICU occupancy of Covid-19 patients for at least two of the three age groups are not equal. Table 1 contains data on daily ICU occupancy of Corona-positive patients by three different age groups. By combining and arranging the data in ascending order and assigning ranks to them, it is found that ties exist in the neutrosophic data set for both determinate and indeterminate parts. Therefore, the neutrosophic statistic is given in (4) applies to this data set containing the measure of uncertainty. Here R1N = [757.5, 766.5], R2N = [775.5, 767], R3N = [297, 296.5], where R1N, R2N and R3N represents the sum of ranks of age groups 1, 2, and 3, respectively.

Table 1 Daily ICU occupancy of Covid-19 patients in Pakistan in December 2020

From (1) and (2), we have

$${S}_{kN}^2=\sum_{i=1}^{k_N}\frac{R_{iN}^2}{n_{iN}}=\left[63170.78,63186.18\right]$$

and

$${S}_{rN}^2=\sum_{i,j}{r}_{ijN}^2=\left[73803.5,73802\right]$$

From (4)

$${H}_N=\frac{\left({n}_N-1\right)\left({S}_{kN}^2-{C}_N\right)}{\left({S}_{rN}^2-{C}_N\right)}=\left[24.12,24.17\right]$$
$${p}_N- value=\left[0.000011,0.000011\right]$$

Assuming the level of significance to be 1%, the critical region is HN > χ20.01,2 = 9.21. Since the calculated value of test statistic based on neutrosophic observations lies in the critical region (p-value < α), we, therefore, reject the neutrosophic null hypothesis and conclude that daily ICU occupancy of different age groups of Covid-19 patients is not equal.

Advantages of the proposed test

In this section, the efficiency of the proposed test HN ∈ [HL, HU] will be compared with the existing test under classical statistics in terms of a measure of uncertainty. As mentioned earlier, the neutrosophic HN = HL + HUINH; INH ∈ [ILH, IUH] has consisted of determinate (the existing test) and indeterminate parts. The neutrosophic form of HN ∈ [HL, HU] for the real data is expressed as: HN = 24.12 + 24.17INH; INH ∈ [0,0.002]; where the first value 24.12 shows the results of the existing test when ILH =0 and 24.17INH is an indeterminate part. Note that the measure of indeterminacy associated with the test HN ∈ [HL, HU] is 0.002. From the study, it can be seen that the proposed test the result of the test statistic in the range of 24.12 to 24.17. On the other hand, the existing test provides only the determined/exact value of the test. In addition, the proposed test HN ∈ [HL, HU] gives information about the measure of uncertainty. Based on the information, the proposed test can be interpreted as follows: when the level of significance α =0.05, the chance of rejecting the null hypothesis when it is true is 0.05, the probability of accepting the null hypothesis is 0.95 with the chance of uncertainty of 0.002. From the comparisons, it can be concluded that the proposed test HN ∈ [HL, HU] gives more information about the test. In addition, the proposed test is flexible, adequate, and effective to be applied in uncertainty as compared to the existing test.

Conclusion and discussion

This article proposed the modified form of the rank-based nonparametric Kruskal Wallis H test for observations containing the measure of uncertainty or the measurement of falseness when comparing k samples. It is evident from Table 1 that uncertain data used for the illustration purpose reduces to the determined part under classical statistics if no observations of uncertainty are logged. For example, for sample one, the first observation 443 for the first group represents the determinate part of the indeterminate interval. The second value, which is 450, represents the indeterminate part of the interval. We can observe here that the modified Kruskal-Wallis test results in the indeterminacy interval rather than the determined values, and this implies that the proposed test provides a good measure of uncertainty. Recent studies also show that the methods dealing with the interval-valued data are more suitable in the indeterminate environment than classical statistical techniques [25, 26]. The work was originally motivated by the extensive research work under the fuzzy logic and neutrosophic statistics used for the interval-valued data set. The proposed nonparametric test can be readily applied to compare k samples testing the hypothesis that they have equal means.

The application of this new neutrosophic Kruskal-Wallis test on the Covid-19 data set showed that the proposed test provides more relevant and adequate results. The data representing the daily ICU occupancy by the Covid-19 patients were recorded for both determinate and indeterminate parts. The existing nonparametric Kruskal Wallis H test under Classical Statistics would have given misleading results. The proposed test showed that at a 1% level of significance, there is a statistically significant difference among the average daily ICU occupancy by corona-positive patients of different age groups.

The modified Kruskal Wallis test can be used to compare the averages of several sample observations or group(s); the proposed test is easy to apply and understandable. The Neutrosophic Kruskal Wallis test results in an uncertain interval, which is ideal when the data is measured from the complex system. The application of the proposed test is recommended for different fields, including biomedical sciences, engineering, and many other statistical areas. On the other hand, applying nonparametric tests under classical statistics on the data containing vagueness can produce misleading results. In conclusion, the proposed neutrosophic nonparametric test provides an efficient tool to data analysts for analyzing k samples in the presence of uncertainty and indeterminacy. However, more properties of this modified Kruskal-Wallis test can be derived for future research. The evaluation of the proposed test using different measures can be studied as future research.