Introduction

The semi-average method is an important method in time series analysis that is used to analyze the trend in the time series data. This method is very simple and easy to apply in practice. In this method, the time series data in hand is divided into two parts and the corresponding average of each part is calculated. The semi-average method can be used in a variety of fields for estimation and forecasting purposes. The semi-average method is used to set the trend in the time series data and provide the forecasting using the data for future implications. Although this method is simple, easy, objective, and gives identical trend values it is a crude method. The application of the time series method in organizational research can be seen in [1] The application of time series analysis can be seen in [2, 3]. The application of the method for geographical data can be seen in [4]. Kosiorowski et al. [5] proposed the Wilcoxon statistics for the time-series data.

The estimation and forecasting of wind speed cannot be done with the use of the appropriate statistical techniques. There are many studies on the use of statistical methods in the fields of energy and weather. The applications of the statistical distribution using the wind speed data can be seen in [6,7,8,9,10,11,12] also discussed the applications of statistical methods in wind speed forecasting.

It is important to note that classical statistics can be applied for forecasting and estimation purposes when the time series data have precise, certain and indeterminate observations. The use of such statistical techniques in an uncertain environment may mislead the expert in estimating or forecasting wind speed. Therefore, statistical methods using fuzzy logic are applied to deal with this type of data. The applications of the fuzzy-based statistical methods in estimating or forecasting can be seen in [13,14,15,16].

Smarandache [17] discussed the advantages of neutrosophic logic over fuzzy logic. Based on this logic, the idea of neutrosophic statistics was introduced by [18]. More information on neutrosophic logic can be seen in [19, 20]. Smarandache [18, 21, 22] showed that neutrosophic statistics (NS) is quite effective to be applied in an uncertain environment than classical statistics. Smarandache [23] proved the efficiency of NS over interval statistics and classical statistics. The NS can be applied when imprecise; vogue and uncertain observations are recorded in the time series data. Guan et al. [24] introduced a new perspective for time series forecasting, considering the quantification of inconsistency as a key characteristic. Abdel-Basset et al. [25] presented a novel neutrosophic forecasting approach using neutrosophic time series (NTS), transforming historical data into NTS data with truth, indeterminacy, and falsity functions. The proposed method includes determining neutrosophic logical relationship groups (NLRGs) and providing a deneutrosophication method for NTS. Singh et al. [26] applied a neutrosophic Set-Based Clustering Algorithm (NEBCA) to fMRI time series datasets, specifically focusing on working memory tasks and resting-state data. Aslam and Albassam [27] introduced the method of least squares under NS. More advantages and applications of NS can be seen in [28, 29].

Aslam and Albassam [27] made a valuable contribution by introducing the application of the least square method under NS for wind speed data forecasting. However, upon the thorough exploration of the literature on NS for time series analysis, we discovered a significant gap. Specifically, no previous research has focused on the utilization of the semi-average method under NS. This gap highlights the need for further investigation and presents an opportunity for us to introduce the semi-average method under NS in this paper. By addressing this gap, we aim to bring a novel approach to time series analysis within the framework of NS. Furthermore, we intend to demonstrate the practicality and effectiveness of our proposed method by applying it to wind speed data and comparing its performance against the semi-average method under classical statistics. Our work will make a significant contribution by introducing the application of the least square method under NS for wind speed data forecasting. Through an extensive review of NS literature, we identified a noteworthy research gap specifically, the absence of previous studies on the utilization of the semi-average method under NS. This comparative analysis will not only showcase the novelty of our approach but also establish its superiority over existing methods.

Preliminaries

Let \({Y}_{1N}={Y}_{1L}+{Y}_{1U}{I}_{1N};{I}_{N}\epsilon \left[{I}_{1L},{I}_{1U}\right]\), \({Y}_{2N}={Y}_{2L}+{Y}_{2U}{I}_{2N};{I}_{N}\epsilon \left[{I}_{2L},{I}_{2U}\right]\),…,\({Y}_{nN}={Y}_{nL}+{Y}_{nU}{I}_{nN};{I}_{N}\epsilon \left[{I}_{nL},{I}_{nU}\right]\) present the time variables under neutrosophic statistics. Suppose that \({n}_{N}\epsilon \left[{n}_{L},{n}_{U}\right]\) be neutrosophic sample size and \({I}_{N}\epsilon \left[{I}_{nL},{I}_{nU}\right]\) be the measure of uncertainty/indeterminacy \({I}_{N}\epsilon \left[{I}_{nL},{I}_{nU}\right]\). Suppose that \({Y}_{1L}\),\({Y}_{2L}\),…,\({Y}_{nL}\) be the neutrosophic values of the time series. Let \({X}_{1N}={X}_{1L}+{X}_{1U}{I}_{1N};{I}_{N}\epsilon \left[{I}_{1L},{I}_{1U}\right]\) and \({X}_{2N}={X}_{2L}+{X}_{2U}{I}_{2N};{I}_{N}\epsilon \left[{I}_{2L},{I}_{2U}\right]\) present the coded values for the first and second halves of the time data. For more details, see, [18, 21, 22]. These neutrosophic averages can be computed as follows.

Step-1: The neutrosophic average of the determinate part of the 1st half is calculated as

$${\overline{Y} }_{1N}=\frac{1}{{n}_{1N}}\sum_{i=1}^{{n}_{1N}}{Y}_{i}$$
(1)

Step-2: The neutrosophic average of the indeterminate part of the 1st half is calculated as

$${\overline{Y} }_{2N}=\frac{1}{{n}_{2N}}\sum_{i=1}^{{n}_{2N}}{Y}_{i}$$
(2)

Step-3: The neutrosophic form of both averages is given by.

$$\overline{Y}_{1N} \, = \,\overline{Y}_{1L} + \,\overline{Y}_{1U} I_{N} ;\,\overline{Y}_{2N} = \,\overline{Y}_{2L} + \,\overline{Y}_{2U} I_{N} ;\,I_{N} \in \left[ {I_{L} ,\,I_{U} } \right]$$
(3)

Semi-average method under indeterminacy

The neutrosophic time series data consists of the neutrosophic numbers having a lower and upper value of the variable of interest. By following the classical semi-average method, the neutrosophic time series data is divided into two halves. The neutrosophic average of the 1st half and the 2nd half are computed and placed at the center of each half. Let \({\overline{Y} }_{1N}\in \left[{\overline{Y} }_{1L},{\overline{Y} }_{1U}\right]\) be the neutrosophic average of the 1st half and \({\overline{Y} }_{2N}\in \left[{\overline{Y} }_{2L},{\overline{Y} }_{2U}\right]\) be the neutrosophic average of the 2nd half. Let \({X}_{1N}\) and \({X}_{2N}\) denote the codded values of each half, respectively. The proposed semi-average method under neutrosophic statistics is explained as

$$\left( {\hat{Y}_{N} - \left( {\overline{Y}_{1L} + \overline{Y}_{1U} I_{N} } \right)} \right) = \frac{{\left( {\left( {\overline{Y}_{2L} + \overline{Y}_{2U} I_{N} } \right) - \left( {\overline{Y}_{1L} + \overline{Y}_{1U} I_{N} } \right)} \right)}}{{\left( {X_{2N} - X_{1N} } \right)}}\left( {X_{N} - X_{1N} } \right);\hat{Y}_{N} \in \left[ {\hat{Y}_{L} ,\hat{Y}_{U} } \right],I_{N} \in \left[ {I_{L} ,I_{U} } \right]$$
(4)

The regression line under the proposed method is given by

$$\hat{Y}_{N} = a_{N} + b_{N} X_{N} ;\hat{Y}_{N} \in \left[ {\hat{Y}_{L} ,\hat{Y}_{U} } \right]$$
(5)

where \({a}_{N}\epsilon \left[{a}_{L},{a}_{U}\right]\) is intercept under neutrosophic statistics and computed as

$$a_{N} = \left( {\overline{Y}_{1L} + \overline{Y}_{1U} I_{N} } \right) - b_{N} X_{1N} ;a_{N} \in \left[ {a_{L} ,a_{U} } \right],b_{N} \in \left[ {b_{L} ,b_{U} } \right],I_{N} \in \left[ {I_{L} ,I_{U} } \right]$$
(6)

where \({b}_{N}\epsilon \left[{b}_{L},{b}_{U}\right]\) is the slope of the regression line and can be computed as.

$$\,b_{N} = \frac{{\,\left( {\left( {\overline{Y}_{2L} \, + \,\overline{Y}_{2U} I_{N} } \right)\, - \,\left( {\overline{Y}_{1L} \, + \,\overline{Y}_{1U} I_{N} } \right)} \right)}}{{\left( {X_{2N} - X_{1N} } \right)}};\,b_{N} \in \left[ {b_{L} ,b_{U} } \right]\,$$
(7)

The necessary steps to show the derivation of the above equations are shown in the appendix.

Application using wind speed data

As mentioned earlier, the wind speed data is usually recorded as a minimum value and a maximum value. Therefore, we will use the wind speed data (mph) and apply the proposed semi-average method to it. The data is recorded from the Meteorology department in Punjab, Pakistan. The forecasting of the wind speed data recorded in the intervals can be done using the neutrosophic statistics adequately as compared to the semi-average method under classical statistics. The data of some selected months are taken from [30] are shown in Tables 1, 2 and 3.

Table 1 Trended value under neutrosophic statistics for January
Table 2 Trended value under neutrosophic statistics for February
Table 3 Trended value under neutrosophic statistics for March

Some necessary computations for the implication of the proposed method are shown as follows.

The regression line under neutrosophic statistics for January 2020 is expressed as

$${\widehat{Y}}_{N}=\left[\mathrm{0,8.86}\right]+[\mathrm{0,0.43}]{X}_{N}$$

The regression line under neutrosophic statistics for February 2020 is expressed as

$${\widehat{Y}}_{N}=\left[\mathrm{0.1428,10.57}\right]+[\mathrm{0.02,0.19}]{X}_{N}$$

The regression line under neutrosophic statistics for March 2020 is expressed as

$${\widehat{Y}}_{N}=\left[\mathrm{1.4,15.33}\right]+[0.87,-\,0.29]{X}_{N}$$

The regression line for January 2020 shows that the intercept of the regression line will be between 0 and 8.86. The slope of the regression for January 2020 is also in an indeterminate interval from 0 to 0.43. The regression line for February 2020 shows that the intercept of the regression line will be between \(0.1428\) and \(10.57\). The slope of the regression for February 2020 is also in the indeterminate interval from \(0.02\) to \(0.19\). The regression line for March 2020 shows that the intercept of the regression line will be between \(1.4\) and \(15.33\). The slope of the regression for March 2020 is also in indeterminate interval from \(0.87\) to \(-\,0.29\). Using these regression lines, the trended values are calculated and placed in Tables 1, 2 and 3. From Tables 1, 2 and 3, it can be noted that the lower (minimum) value of wind speed is mostly zero. Therefore, a larger indeterminacy can be expected in forecasting using this indeterminate data. From Table 3, the fitted regression line for the 4th March 2020 is \({\widehat{Y}}_{N}=0.0628+9.81{I}_{N};{I}_{N}\epsilon \left[\mathrm{0,0.99}\right]\). From the trended regression equation, it can be forecast that the wind speed (mph) will be 0.0628 to 9.81 with the measure of indeterminacy 0.99. The fitted trend lines and the real wind speed (mph) for 3 months are shown in Fig. 1. From Fig. 1, it can be seen that for the month of January, the values of the first 11 days are close to the fitted regression lines while the values of the remaining days are away from the upper value of the trended line. In actual values of the months of February and March are away from the upper values of the trended lines. On the other hand, the lower values of actual data are very close to the lower values of trended lines for 3 months. Figure 1 clearly shows that indeterminate wind speed (mph) should be forecasted using the semi-average method under neutrosophic statistics. For this type of time series data, the use of the existing semi-average method under classical statistics may mislead decision-makers. Based on the wind speed data, it can be concluded that the proposed semi-average method under neutrosophic is adequate and suitable to use for forecasting purposes.

Fig. 1
figure 1

The trend lines and actual observations for January, February and March 2020. Figure 1 is generated using R version 3.2.1 (https://www.r-project.org/)

Competitive study based on wind speed data

The proposed semi-average method under neutrosophic is a generalization of the semi-average method under classical statistics. The equations of the proposed semi-average method reduce to equations under classical statistics when \({I}_{L}=0\). The efficiency of the proposed semi-average method under neutrosophic statistics will be compared with the existing semi-average method under classical statistics in terms of information and adequacy. For the comparison purpose, we presented the trended values, their neutrosophic forms, and the measure of indeterminacy in Tables 1, 2 and 3. Note here that in each neutrosophic form in Tables 1, 2 and 3, the first value presents the trend value using classical statistics. The second value of each neutrosophic form presents the indeterminate part. Each neutrosophic form reduces to the determinate part when \({I}_{N}\)=0. For example, the fitted regression line for 5th February 2020 is \({\widehat{Y}}_{N}=0.0828+10{I}_{N}\). In this neutrosophic form, the first value \(0.0828\) presents the forecasted value under classical statistics. The second value \(10{I}_{N}\) denotes the indeterminate part and the measure of indeterminacy is \({I}_{N}\epsilon \left[\mathrm{0,0.99}\right]\). From this analysis, it can be seen that this is the beauty of the proposed model in that it provides the forecasting results in indeterminate intervals rather the exact result in the presence of uncertainty. In addition, the proposed method gives additional information about the measure of indeterminacy that cannot be obtained from the analysis under classical statistics. Note here that there are higher measures of indeterminacy in the months of January and February. In March, after the 6th day, smaller values of measure of indeterminacy in trend values are found. From the analysis, it is clear that the existing semi-average method under neutrosophic statistics is more flexible than the classical method. In addition, in the presence of a high measure of indeterminacy, the forecasting may be affected and mislead the decision-makers. We also note that for interval data, it would be better to use neutrosophic statistics to forecast wind speed.

Conclusions

The paper introduced a modified version of the semi-average method within the context of neutrosophic statistics. This modified semi-average method serves as a generalization of the existing approach. Based on the methodology and application, it can be concluded that the proposed semi-average method is well-suited for situations where data is recorded in intervals. In contrast, utilizing the existing semi-average method for wind speed prediction with interval data may lead decision-makers astray. It is important to note that the proposed method has its limitations and can only be applied when dealing with imprecise observations within time series data. However, the neutrosophic semi-average method utilizes less information compared to other time series methods. The applications of the proposed method extend to various fields such as meteorology for weather forecasting, business, and medical science. As for further research, it is worth considering other time series methods within the framework of neutrosophic statistics. This would open up possibilities for additional advancements in the field. Other time series methods under neutrosophic statistics can be extended as future research.