1 Introduction

We all agree that forecasting is the scientific basis for the good plans required for many areas. Because of its important role in many fields, forecasting always gets the attention of managers and scientists. Despite several discussions in the literature, the problems of forecasting have not yet been completely solved [1, 21]. In statistics, time series and regression are popular models applied to forecast, but they have many disadvantages in practice. When building a regression model, we must constrain the data conditions that do not satisfy for real data. Therefore, it often receives the limited results in forecasting [2, 5, 22, 35].

In socioeconomic development, each field and country has stored a lot of data over time. Therefore, time series has become the most common data type. For these data, forecasting is the most attractive direction. There are two kinds in building the time series. They are non-fuzzy time series (NFS) and fuzzy time series (FS) models. Although NFS models often have more advantages than regression models in the real application, they have some limitations. For example, they only give the remarkable results if the series have normal changes or stationary [46]. Based on historical data, the NFS sets up a mathematical function to forecast, so it had not much flexibility. Non-reliance on the linguistic level to build the relationship of the elements in series is considered the main limitation of the NFS models. Because the FS models are built based on the fuzzy relations of the elements in series, they have overcome the weaknesses of NFS models.

The FS model is developed in two main directions. First, it builds the models from the original series and forecasts for the future from these models themselves. Abbasov and Manedova [1] and Tai et al. [46] have had the important contributions in this direction. Second, the original series are interpolated in order to obtain the new ones that are closely related to each other in the whole series. After that, this new series is used as the good input data to forecast. Compared to the first direction, the second one is getting more attention in our knowledge. Song et al. [42] were the pioneer in this direction with data on the enrollment of the University of Alabama (EnrollmentUA). Qiang et al. [43] used the triangular fuzzy relation for performing. Ming et al. [14] and Chen et al. [16] improved the result of [43] when taking notice of fuzzy level. Huarng [27] and Own [37] presented a heuristic model for FS using heuristic knowledge to improve the forecast for EnrollmentUA. Based on neural network, Alpaslan [7] gave the interesting results in some cases. Wu and Chau [50] constructed several soft computing approaches for rainfall prediction. Two aspects were considered improving the accuracy of rainfall prediction: carrying out a data preprocessing procedure and adopting a modular method. The proposed techniques included the moving average (MA) and singular spectrum analysis (SSA). The modular models were composed of local support vectors regression (SVR) model and local artificial neural networks (ANNs) model. Results showed that the MA was superior to the SSA when they were coupled with the ANN. Riccardo and Kwok [48] proposed the artificial neural networks-based interval forecasting of streamflow discharges using the lower and upper bounds and multi-objective fully informed particle swarm. Ghalandari et al. [24] introduced the aeromechanical optimization technique of first row compressor test stand blades using a hybrid machine learning model of genetic algorithm and artificial neural network. The authors used three-dimensional geometric parameters to conduct blade tuning. As a result, the reduced frequency increases by at least 5% in both stall and classical regions, and force response constraints are satisfied.

From the fuzzy model in accordance with different linguistic levels, many scientists such as [25, 34, 49] have proposed the new models. Moreover, Baghban et al. [10] used the adaptive network-based fuzzy inference system (ANFIS) which provided highly accurate predictions. The study was expanded based on the independent variables of temperature, nanoparticle diameter, nanofluid density, volumetric fraction, and viscosity of the base fluid. Prashant et al. [38] presented a fuzzy dominance-based analytical sorting method as advancement to the existing multi-objective evolutionary algorithm. The objective functions are defined as fuzzy objectives, and competing solutions are provided an overall activation score based on their respective fuzzy objective values. Recently, Tai [47] proposed a FS model from the results of the fuzzy clustering problem. Many applications have used the optimal techniques such as bat algorithm [51], genetic algorithm [38], and whale algorithm [36] in recent years. In this study, we consider the genetic algorithm to apply to FS model. Applying the genetic algorithm (GA) in clustering, Jain [30] proposed the FS for EnrollmentUA. Ali et al. [6] proposed a method based on a genetic algorithm (GA) for generation expansion planning (GEP) in the presence of wind power plants. A six-state model was used to obtain the wind farm output power model. The method of calculating the six-state wind farm output model with the turbine’s forced outage rate of wind farm units for use in long-term GEP calculations is described. Also, using GA, Aldouri et al. [3] introduced a model with two levels. The first level implements GA based on the autoregressive integrated moving average (ARIMA) model. The second level is utilized based on the forecasting error rate.

In time series, each value on time t is called the element, and the universal set is a set which contains all the elements as input data to forecast. For the second direction, a time series model is built to have three main phases. (1) Build the universal set and divide the suitable groups for it, (2) determine the elements for each group, and (3) establish the relationship between each element of the series to the groups found from (2) to build the rule for forecasting. For (1), many authors used the universal set to be the original series itself [13, 16, 23]. Some others used the maximum value and minimum value of original series to make the universal set [13, 16]. In addition, Huarng et al. [27, 28] proposed two new techniques for finding intervals based on the mean of the distributions. Abbasov et al. [1] and Tai [46] have built the universal set based on the change of data between consecutive periods of time or their percentage change. Dividing the universal set with appropriate number of groups is an important problem because it will influence the result of the model. Almost all of the existing models give the specific constant in performing. (It is often five or seven.) Others determine it based on the experiments from many data sets. However, the number of groups is only considered suitable if they depend on the similar level of elements in series. When the elements in a series have a lot of difference, the number of found groups will be large and vice versa. In this study, we use whole series as the universal set and determine the number of groups for it by the automatic clustering algorithm. Through this algorithm, in the series where the similarity of the elements is not the same, the number of clusters found will be different.

For (2), many authors divided the universal set to become the equal intervals. The elements in each cluster were also determined by the k-mean algorithm [9]. Tai [47] built groups for the elements based on the clustering algorithm. In this study, we proposed the cluster analysis method using genetic algorithm to find the specific elements in each group. For (iii), several important studies have been performed. For instance, Song et al. [42] used the matrix operations and Chen et al. [13] took the fuzzy logic relations. Moreover, many authors [4, 19, 20, 28] used artificial neural networks to determine fuzzy relations.

In addition, the fuzzy relationship based on the triangle and trapezoid fuzzy numbers was also considered in [25]. The relationship based on clustering algorithm is also established by Tai and Nghiep [47]. Many researchers had also used either the centroid method such as [13, 27, 28] or the adaptive expectation method [5, 16, 47] to perform. This article contributes to three stages (1), (2), and (3) for FS model:

For (1), we proposed a new algorithm for finding the appropriate number of groups divided for each series. This value depends on the similar level of objects in series. This method has outstanding advantages in comparison with the existing ones that were presented as linguistic values with levels being constant. (It is usually five or seven in applications.)

For (2), we propose the improved fuzzy genetic algorithm. It can find the specific elements for each group and the probability to belong to the established groups of the element in series.

For (3), based on the principle for normalizing series and the result from (2), a new interpolating method is also proposed.

Incorporating all these improvements, we propose the best model for series time. This model is better than the existing ones through many well- known data sets. We also establish the MATLAB procedure for the proposed model. This procedure can perform effectively for real data. In addition, we also apply the proposed model to forecast the number of COVID-19 victims in Vietnam.

The next section of the paper is structured as follows. Section 2 presents some definitions related to fuzzy time series and proposes a new model. This section also proves the convergence of the proposed model. Section 3 gives the specific steps of the new model and compares it with existing ones over many data sets. An application in Vietnam of the proposed model is presented in Sect. 4. The final section is conclusion.

2 The proposed algorithm

2.1 The parameters to evaluate the established model

Given a series of historical data \(\left\{ {{X_i}} \right\}\) and predictive value \(\left\{ {\widehat{{X_i}}} \right\} ,\,\,i = 1,2, \ldots ,N,\), respectively, we have the popular parameters to evaluate the built FTS models as follows:

Mean squared error:

$$\begin{aligned}MSE = \frac{1}{N}{\sum \limits _{i = 1}^N {\left( {{{{{\hat{X}}}}_i} - {X_i}} \right) } ^2}. \end{aligned}$$

 Mean absolute error:

$$\begin{aligned}MAE = \frac{1}{N}\sum \limits _{i = 1}^N {{{\left| {{{{{\hat{X}}}}_i} - {X_i}} \right| }}}\end{aligned}$$

 Mean absolute percentage error:

$$\begin{aligned}MAPE = \frac{1}{N}\sum \limits _{i = 1}^N {\left( {\frac{{\left| {{{{\hat{X}}}_i} - {X_i}} \right| }}{{{X_i}}}.100} \right) .} \end{aligned}$$

Symmetric mean absolute percentage error:

$$\begin{aligned}SMAPE = \sum \limits _{i = 1}^N {\left( {\frac{{\left| {{{{\hat{X}}}_i} - {X_i}} \right| }}{{({X_i} + {{{\hat{X}}}_i})/2}}100} \right) .} \end{aligned}$$

Mean absolute scaled error:

$$\begin{aligned}MASE = \frac{{\sum \nolimits _{i = 1}^N {|{{{\hat{X}}}_i} - {X_i}|} }}{{\frac{N}{{N - 1}}\sum \nolimits _{i = 2}^N {|{X_i} - {X_{i - 1}}|} }}. \end{aligned}$$

For the built models, the smaller these parameters are, the better the models are.

2.2 The proposed algorithm

A cluster with m elements is given. If these elements converge to the same one v by any algorithm, then v is called as the prototype element of the cluster. Let \({{T}} = \left\{ {{T_1},{T_2}, \ldots ,{T_N}} \right\}\) be the time series, and the \(V^{(t)}\) be set of prototype elements for clusters built at time t. We propose the forecasting model based on the genetic algorithm and clustering technique as follows:

Step 1 Initialize \(t=0\) and \({{{V}}^{({{0}})}} = \left\{ {v_1^{(0)},v_2^{(0)}, \ldots ,v_N^{(0)}} \right\} = \left\{ {{X_1}, {X_2}, \ldots ,{X_N}} \right\} .\) where \({X_c} = 10{T_c}/\max \left\{ {{T}} \right\} , 1\le c \le N.\)

Step 2 Update the prototype elements using Formula (1):

$$\begin{aligned} v_i^{(t + 1)} = \frac{{\sum \nolimits _{j = 1}^N {f\left( {v_i^{(t)},v_j^{(t)}} \right) v_j^{(t)}} }}{{\sum \nolimits _{j = 1}^N {f\left( {v_i^{(t)},v_j^{(t)}} \right) } }}, 1\le i \le N, \end{aligned}$$
(1)

where

$$\begin{aligned} f\left( {v_i^{(t)},v_j^{(t)}} \right) = \left\{ {\begin{array}{*{20}{l}} {\exp \left( { - \frac{{{d_E}\left( {v_i^{(t)},v_j^{(t)}} \right) }}{\lambda }} \right) }&{}{\; \mathrm{if} \;\; {d_E}\left( {v_i^{(t)},v_j^{(t)}} \right) \le \mu \alpha _{ij}(t)},\\ 0&{}{\mathrm{{otherwise,}}} \end{array}} \right. \end{aligned}$$
(2)

where \(\alpha _{ij}\left( t\right) =\alpha _{ij}\left( t-1\right) /\left[ 1+\alpha _{ij}\left( t-1\right) f\left( v_i^{(t-1)},v_j^{(t-1)} \right) \right]\) is the balance factor, and \(\alpha _{ij}\left( 0\right) =1.\) \(\mu =\sum \nolimits _{i<j}{d_E\left( v_i^{(t)},v_j^{(t)} \right) }/{{N}\atopwithdelims (){2}}\) is the average of Euclidean distance \(d_E\left( v_i^{(t)},v_j^{(t)} \right)\), \(\lambda =\sigma /r\), \(\sigma =\sqrt{\sum \nolimits _{i<j}{\left[ d_E\left( v_i^{(t)},v_j^{(t)} \right) -\mu \right] ^2}/{{N}\atopwithdelims (){2}}}\) is the standard deviation, and r is a constant.

Step 3 Repeat Step 2 until \(\left\| {{V^{(t + 1)}} - {V^{(t)}}} \right\| = {\max _i}\left\{ {\left| {v_i^{(t + 1)} - v_i^{(t)}} \right| } \right\} < \varepsilon .\)

\(v^{(t+1)}\) determined by (1) is expansion or narrowing of \(v^{(t)}\), such that the elements in the same group will be changed to become a prototype element. It means that after an iteration of Step 2, each element in X will converge to the prototype element of group containing it. Step 3 ends when the difference of all elements between two successive iterations is less than \(\varepsilon\). This value can affect the number of divided groups as well as the computational cost. The iterations of algorithm will increase if the value of \(\varepsilon\) decreases. In this study, we have taken \(\varepsilon =10^{-4}\) for all numerical examples.

Step 4 Encode the clustering solutions. In general genetic algorithm, each variable is represented by a gene, and the chromosome is the set of those genes, which represents a solution to the problem. In the proposed algorithm, the chromosome M is formed by kp genes representing for k clusters.

Step 5 Initialize N chromosomes and evaluate their Improved Davies and Bouldin index [18] by (3).

$$\begin{aligned} IDB = \frac{1}{k}\sum \limits _{i = 1}^k {\mathop {\mathrm{{max}}}\limits _{i \ne j} \left\{ {\frac{{\frac{1}{{\left| {{C_i}} \right| }}\sum \nolimits _{{X_i} \in {C_i}} {{d_E}({X_i},{{\bar{X}}_i})} + \frac{1}{{\left| {{C_j}} \right| }}\sum \nolimits _{{X_j} \in {C_j}} {{d_E}({X_j},{{{\bar{X}}}_j})} }}{{{d_E}({{{\bar{X}}}_i},{{\bar{X}}_j})}}} \right\} } , \end{aligned}$$
(3)

where,

  • \({{\bar{X}}_i}\) and \({{\bar{X}}_j}\) are the centroid of \(C_i\) and \(C_j\).

  • \(d_E (.)\) is the Euclidean distance.

  • \(\left| {{C_i}} \right|\) is the number of elements in cluster \(C_i\).

Step 6 Utilize the selection, crossover, and mutation operators:

  • Crossover Perform the crossover operator to chromosomes by the probability 0.85. Let \(L_1\) and \(L_2\) be the two parent chromosomes; then, the child chromosome is created as follows:

    $$\begin{aligned}Child=L_1 + 0.85.(L_2 - L_1).\end{aligned}$$

    In a similar way for all chromosomes in population, we have a new population.

  • Mutation Let h be the previous value of the mutated gene, the new value \(h'\) of h is computed as follows:

    $$ h^{\prime} = \begin{array}{*{20}l} {\left( {1 \pm 2\delta } \right)h} & {{\text{if}}\quad{\mkern 1mu} h \ne 0,} \\ { \pm 2\delta } & {~{\text{if}}\quad{\mkern 1mu} h = 0,} \\ \end{array}$$

    where \(\delta\) is a random number belonging to the interval \(\left[ 0,1\right]\), and the \(+\) or − sign occurs with equal probability.

  • Selection The Roulette wheel strategy [33] is used to implement the selection operation. Probability of choosing the ith element is determined by (4).

    $$\begin{aligned} {p_i} = \frac{{{IDB_i}}}{{\sum \nolimits _{j = 1}^N {{IDB_j}} }}, \end{aligned}$$
    (4)

    where \(IDB_i\) is the fitness function of i  and N is the size of the current population.

Step 7 Calculate the IDB index of the chromosomes obtained in Step 6.

Step 8 Replace the current clustering solution by the new ones having the smaller IDB index.

Repeat Step 5 to Step 7 until \(iter>maxiter\), where iter and maxiter are the number of current iterations and the required one of the proposed algorithm, respectively.

The parameters of the genetic algorithm used in the proposed model are summed by Table 1.

Table 1 The used parameters of genetic algorithm

Step 9 Let \({\mu _{ic}} \in U^{(0)}\) be the result of fuzzy clustering at the first time \(t=0\), and \(M=(M_i), 1\le i \le k\) be the optimal central cluster. Establish the first partition matrix with the elements computed by (5).

$$\begin{aligned} {\mu _{ic}} = \left\{ {\begin{array}{*{20}{l}} {1 \quad \mathrm{if}\quad{X_c} \in {C_i}}\\ 0 \quad {\mathrm{if \quad otherwise}} \end{array};1 \le c \le N,1 \le i \le k} \right. \end{aligned}$$
(5)

Find the prototype element of clusters by (6).

$$\begin{aligned} {\mathrm{{w}}_i} = \frac{{\sum \nolimits _{c = 1}^N {{{\left( {{\mu _{ic}}} \right) }^m}{X_c}} }}{{\sum \nolimits _{c = 1}^N {{{\left( {{\mu _{ic}}} \right) }^m}} }}, \end{aligned}$$
(6)

where \({\mu _{ic}} \in {U^{(t)}},1 \le i \le k\) is fuzzy probability of k clusters, and \(w_i\) is the element official centroid of k clusters.

The value of m in (6) is the fuzziness degree. When \(m = 1\), the fuzzy clustering becomes the non-fuzzy clustering. When \(m \rightarrow \infty\), the partition becomes completely fuzzy with \(\mu _{ic}=1/k\). Although [11, 12, 39] had proposed the rules to find the supreme of m, the best value of m has not still been determined. Performing a lot of series, we take \(m =2\) for applications.

Step 10 Update the new partition matrix \({U^{\left( {t + 1} \right) }},\) where each element of \({U^{\left( {t + 1} \right) }}\) is determined by (7):

$$\begin{aligned} {\mu _{ic}} = \frac{{{d_E}{{\left( {{\mathrm{{w}}_i},{X_c}} \right) }^2}}}{{\sum \nolimits _{j = 1}^k {{d_E}{{\left( {{\mathrm{{w}}_j},{X_c}} \right) }^2}} }},1 \le i \le k,1 \le c \le N, \end{aligned}$$
(7)

with \({d_E}\left( {{w_i},{X_c}} \right)\) being the Euclidean distance of cluster central \(w_i\) and original data \(X_c\).

Step 11 Repeat Step 9 and Step 10 until

$$\begin{aligned} \left\| {{U^{(t + 1)}} - {U^{(t)}}} \right\| =\mathrm{max}_{ic}\{|\mu _{ic}^{(t+1)}-\mu _{ic}^{(t)}| \}< \varepsilon . \end{aligned}$$

Step 12 Calculate the center \((M_i)\) of each cluster and forecast \(Y_c\) according to the following rule:

$$\begin{aligned}{Y_c} = \sum \limits _{i = 1}^k {{M^T}{\mu _{ic}}}, 1\le c \le N, \end{aligned}$$

where \({M^T} = {\left( {{M_i} \times \max \left\{ {{T}} \right\} /10} \right) ^T}\).

The proposed model includes three phases with 12 steps. Phase 1 has three steps (Step 1 to Step 3) that are used to determine the suitable number of groups to divide the series. In this phase, at the first time, each element is considered to be a cluster. After many iterations, the elements in the same group will converge to the prototype element. It can run many iterations depending on the similar level of elements in series. The result of this phase is the number of groups k that the series is divided. In many existing models, k is often chosen as constant \((k = 5\) or \(k = 7).\) In the proposed model, k is determined by the automatic algorithm.

Phase 2 has five steps (Step 4 to Step 8) that it finds the elements for k groups by the improved genetic algorithm. It builds the operators such as selection, crossover, and mutation with the objective function as IDB. First, the algorithm encodes the clustering solutions. Each variable is represented by a gene, and the chromosome is the set of those genes, which represents a solution to the problem. Initialize N chromosomes and evaluate their IDB; after that, perform the selection, crossover, mutation operators, and compute again IDB. These processes are repeated until the IDB index is almost unchanged. In our experiment, this phase often converges with the number of iterations less than 50.

Phase 3 includes three steps (Step 9 to Step 12). It builds the fuzzy relationship of each element in series to the established groups from Phase 2 and proposes the rule to interpolate data. In this phase, Step 9 initializes the initial fuzzy relationship by (5). After that, the elements in this matrix will be updated until all the values of two consecutive iterations are almost unchanged. From the result of this matrix, we give the rule to forecast.

To sum up, the proposed model has to perform three phases, and each phase includes many steps and runs many iterations. Therefore, the computation of the proposed algorithm is complex compared to others. The flowchart of the proposed model is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the proposed model

We have established the complete MATLAB procedure for the proposed model. It can perform quickly and effectively for real data.

2.3 The convergence of the proposed model

The convergence of the proposed model is shown by three phases. Phase 2 will stop when the number of iterations is maxiter. (We chose maxiter = 1000.) The convergence of Phase 3 is similar as the fuzzy cluster analysis algorithm for the discrete elements (FCM) that it has been proven in many documents [31]. Therefore, we only consider the convergence of the Phase 1 which is proved by Theorem 1.

Theorem 1

If the function f(uv) in (2) satisfies:

(i):

\(0 \le f\left( {u,v} \right) \le 1\) and \(f\left( {u,v} \right) = 1\) when \(u=v.\)

(ii):

f(uv) depends only on \(\left\| {u - v} \right\|\); the distance from u to v.

(iii):

f(uv) is decreasing for \(\left\| {u - v} \right\| .\)

Then, there exist t and \(\varepsilon\) such that \(V^{(t)}\) whose elements are determined by (1) satisfies:

$$\begin{aligned} \left\| {{V^{(t+1)}} - {V^{(t )}}} \right\| < \varepsilon . \end{aligned}$$

Proof

Let \(C_1^{(t)}\) be the convex hull of \(\left\{ {v_1^{(t)},...,v_N^{(t)}} \right\} .\) Then, \(v_i^{(t + 1)} \in C_1^{(t)}\) is a weighted average of \(v_j^{(t)},j = 1,...,N.\) Therefore, \(C_1^{(t)} \supseteq C\left( {\left\{ {v_1^{(t + 1)},...,v_N^{(t + 1)}} \right\} } \right) = C_1^{(t + 1)}.\) Since

$$\begin{aligned} C_1^{} = \mathop {\lim }\limits _{t \rightarrow \infty } C_1^{(t)}, \end{aligned}$$

there exist i such that \(\mathop {\lim }\nolimits _{t \rightarrow \infty } u_{1,i}^{(t)} = {u_{1,i}}\) where \(u_{1,i}^{(t)}\) is a vertex of \(C_1^{(t)}\). For each t and i, \(u_{1,i}^{(t)} = v_k^{(t)}\) for at least one k; there exists j such that \(v_j^{(t)} = u_{1,i}^{(t)}\) for infinite many t’s. Therefore, there exists \(t \rightarrow \infty\) such that \(v_j^{({t_n})} = u_{1,i}^{({t_n})},\) which leads to \(\mathop {\lim }\nolimits _{n \rightarrow \infty } v_j^{({t_n})} = {u_{1,i}}.\) We consider two possible cases as follows:

  1. Case 1:

    If \(v_j^{({t_n})} = {u_{1,i}}\) except for any finite t, then \(\mathop {\lim }\nolimits _{t \rightarrow \infty } v_j^{(t)} = {u_{1,i}}.\)

  2. Case 2:

    If there exists \(j' \ne j\) and \({s_n} \rightarrow \infty\) such that \(\forall n,\) \(v_{j'}^{({s_n})} = u_j^{({s_n})}.\) Assume that \(u_{1,i}^{(t)} = v_j^{(t)}\) or \(v_{j'}^{(t)}\) for all \(t>T.\) From equation (2), if \(v_j^{(s)} = v_{j'}^{(s)}\) for some s\(v_j^{(t)} = v_{j'}^{(t)}\) for all \(t>s\) . Therefore, for any \(s > 0,\) there exists \(t > s\) such that \(u_{1,i}^{(t)} = v_j^{(t)}\) and \(u_{1,i}^{(t + 1)} = v_{j'}^{(t + 1)}.\) We claim that this case, however, can never happen with t being large enough.

Without loss of generality, assume that \({u_{1,i}} = 0,v_j^{(t)} \le 0,\) and \(v_k^{(t)} > 0\) for \(k \ne j\) or \(k \ne j'.\) If \(v_{j'}^{(t + 1)}\) later becomes the new vertex, then \(v_{j'}^{(t + 1)} < v_j^{(t + 1)}.\)

Moreover, since \(v_{j'}^{(t + 1)}\) is the new vertex we have

$$\begin{aligned} v_{j'}^{(t + 1)} \le 0 \Rightarrow \sum \limits _{k = 1}^N {f(v_{j'}^{(t)},v_k^{(t)})v_k^{(t)} \le 0} . \end{aligned}$$

Since \(v_j^{(t)}\) is the current vertex, \(\left\| {v_j^{(t)} - v_k^{(t)}} \right\| > \left\| {v_{j'}^{(t)} - v_k^{(t)}} \right\|\) for all k. Then,

$$\begin{aligned} \sum \limits _{k = 1}^N {f(v_j^{(t)},v_k^{(t)})v_k^{(t)} \le } \sum \limits _{k = 1}^N {f(v_{j'}^{(t)},v_k^{(t)})v_k^{(t)} < 0} , \end{aligned}$$

and

$$\begin{aligned} 0 < \sum \limits _{k = 1}^N {f(v_j^{(t)},v_k^{(t)}) \le } \sum \limits _{k = 1}^N {f(v_{j'}^{(t)},v_k^{(t)})} , \end{aligned}$$

and we have

$$\begin{aligned} v_{j'}^{(t + 1)} < v_j^{(t + 1)}, \end{aligned}$$

which is a contradiction to (2). Therefore, \(u_{1,i}^{(t)} = v_j^{(t)}\) for some j and for all t large enough. Then, \(\mathop {\lim }\nolimits _{t \rightarrow \infty } v_j^{(t)} = {u_{1,i}}.\)

We can apply a similar result for \(C_2\) as \(C_1^{(t)}\); at least one subject converges to each vertex of \(C_2.\) Then, we can run similar steps again for \(C_3,C_4,...\) until all subjects converge. This completes the proof of Theorem 1. \(\square\)

2.4 The computational complexity of the proposed algorithm

Let N be the number of elements in series, k be the number of clusters, p be the number of dimensions, \(t_\mathrm{max}\) be the number of iterations of algorithm, and P be the size of population in the genetic algorithm. Based on the research of Hongchun et al. [26] and Xu et al. [51], the computational complexity of the proposed model is explained as follows:

  • *Phase 1 (Step 1 to Step 3). Because the number of simulation replications is \(t_{max}\), the computational complexity of this phase is \(O(t_{max}N^2k)\), where

    standardizing time series needs O(N); updating the value of prototypes needs \(O(N^2k)\), and comparing two prototypes in each iteration needs \(O(t_\mathrm{max}N).\)

  • *Phase 2 (Step 4 to Step 8). This phase uses the genetic algorithm with some improvements. The computational complexity of genetic algorithm is \(O(t_\mathrm{max}. NPkp)\) because of the following reasons:

         The number of genes in a chromosome is kp. If there exists variable one in each gene, we need O(kp) to initialize one gene. P chromosomes need to be initialized, so the initialization of the whole population needs O(Pkp).

         In one iteration, O(Pkp) is required for chromosome crossing. In one chromosome, gene rearrangement requires \(O(N^2kp).\)

         Each iteration of mutation step is run O(Pkp) times, and O(NPkp) is required to select the best chromosomes in each iteration.

         The IDB index is used as objective functions and required O(Nkp) to calculate individual fitness.

  • *Phase 3 (remaining steps): It is used to find the fuzzy relationship of elements in clusters, so it has the same construction with fuzzy c-mean algorithm. The computational complexity of this phase is \(O(t_\mathrm{max}Nkp)\) based on the research of Sreenivasarao and Vidyavathi [45].

     In short, the total computational complexity of the proposed algorithm is

$$\begin{aligned} O(t_\mathrm{max}N^2Pkp). \end{aligned}$$

On comparing this result with other ones, we have Table 2.

Table 2 Comparison of the computational complexity of models

Almost all of the popular fuzzy time series models have the computational complexity as the same as the model of Tai (2019) [46]. Table 2 shows that the computational complexity of the proposed model is more complicated than other models.

3 Numerical example and comparisons

3.1 Numerical example

We use the EnrollmentAU series performed in many studies [13, 46, 47] to illustrate the developed algorithm. The value of the EnrollmentAU series is given by Column \(T_i\) of Table 3.

Step 1:

Initialize \(t=0\); because of \(\mathrm{max}\{T\}=19337,\) \(V^{(0)}\) is computed by the third column of Table 3.

Step 2:

Calculating the prototype elements by Formula (1), we obtained \(V^{(1)}\) in Table 3.

Step 3:

Because \(\mathrm{max}_i \{ |v^{(1)}_i-v^{(0)}_i|\} = 0.223 > \varepsilon ,\) the iterations of Phase 1 will continue. After 6 iterations of the above steps, Phase 1 stops. The result of these iterations is given in Table 3 and is shown in Fig. 2.

Table 3 The detailed outcome of the first phase
Fig. 2
figure 2

The convergence of EnrollmentUA data to 10 clusters

From Table 3 as well as Fig. 2, we see that the elements of series converge to 10 elements. Therefore, we divide the given series to \(k = 10\) groups.

Step 4 From the result of Phase 1, we encode the first chromosome as follows:

$$\begin{aligned}M^0=\{8.768;~6.791;~7.249;~8.136;~7.687;~ 8.245;~ 9.942;~ 9.533;~ 8.490;~ 9.737\}.\end{aligned}$$

Step 5 Initialize 100 chromosomes and calculate the objective function (IDB) for each chromosome in population. The values of 100 chromosomes and IDB are shown in Table 10 (see Appendix A). Then, selecting the best chromosome with the smallest IDB, we have the chromosome 1 \((IDB=0.85).\) This result is used to create the new population for Step 6 and Step 7. Continue to run the steps of Phase 2 until IBD unchanged as shown in Fig. 3.

Fig. 3
figure 3

The convergence of the proposed method in Phase 2 after 300 iterations

Step 8 When Step 8 ends, we have the outcomes as follows:

  • The value of the best objective function: \(IDB =0.1534.\)

  • The optimal clusters:

    $$\begin{aligned}\begin{array}{*{20}{l}} \begin{array}{l} {C_1} = \{ {X_5},{X_6},{X_{12}},{X_{13}}\} ;{C_2} = \{ {X_{11}}\} ;{C_3} = \{ {X_9},{X_{10}},{X_{17}}\} ;\\ {C_4} = \{ {X_4}\} ;{C_5} = \{ {X_{18}}\} ;{C_6} = \{ {X_{14}},{a_{15}}\} ;{C_7} = \{ {X_1},{X_2},{X_3}\} ; \end{array}\\ {{C_8} = \{ {X_{19}},{X_{20}},{X_{21}};{X_{22}}\} ;{C_9} = \{ {X_8}\} ;{C_{10}} = \{ {X_{16}}\} .} \end{array} \end{aligned}$$
  • The optimal centroid of clusters:

    $$\begin{aligned}M^{300} = \{7.995; 8.266; 8.719; 7.171; 7.600; 7.014; 6.751; 8.475; 9.810; 7.832\}.\end{aligned}$$

Step 9 Performing Phase 3, we have the first partition matrix with \(t=0\) as follows:

$$\begin{aligned}{\mu _{ic}} = \left[ {\begin{array}{cccccccccccccccccccccc} 0&{}0&{}0&{}0&{}1&{}1&{}0&{}0&{}0&{}0&{}0&{}1&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 1&{}1&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}1&{}1&{}1\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0&{}0 \end{array}} \right] .\end{aligned}$$

Calculating the representative element of clusters, we have:

$$\begin{aligned}{w_i} = \left[ {7.95;~ 6.98 ;~ 10.00 ;~ 9.39 ;9.79 ;~ 8.20 ;~ 8.48 ;~ 7.60 ; ~8.27;~ 8.72} \right] \end{aligned}$$

Step 10 Updating the new partition matrix, we obtain \(U^{(1)}:\)

$$\begin{aligned}{\mu ^{(1)} _{ic}} = \left[ {\begin{array}{rrrrrrrrrrrrrrrrrrrrrr} {0.030}&{}{0.001}&{}{0.044}&{}{0.000}&{}{0.906}&{}{0.963}&{}{0.422}&{}{0.000}&{}{0.001}&{}{0.001}&{}{0.000}&{}{0.956}&{}{0.806}&{}{0.659}&{}{0.699}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}\\ {0.831}&{}{0.992}&{}{0.723}&{}{0.000}&{}{0.002}&{}{0.001}&{}{0.005}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.001}&{}{0.003}&{}{0.013}&{}{0.011}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}\\ {0.004}&{}{0.000}&{}{0.003}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.002}&{}{0.000}&{}{0.000}&{}{0.001}&{}{0.000}&{}{0.000}&{}{0.001}&{}{0.002}&{}{0.002}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.016}&{}{1.000}&{}{1.000}&{}{0.010}\\ {0.006}&{}{0.000}&{}{0.005}&{}{0.000}&{}{0.001}&{}{0.000}&{}{0.003}&{}{0.000}&{}{0.002}&{}{0.002}&{}{0.000}&{}{0.000}&{}{0.002}&{}{0.004}&{}{0.003}&{}{0.000}&{}{0.000}&{}{1.000}&{}{0.003}&{}{0.000}&{}{0.000}&{}{0.004}\\ {0.005}&{}{0.000}&{}{0.004}&{}{0.000}&{}{0.001}&{}{0.000}&{}{0.002}&{}{0.000}&{}{0.001}&{}{0.001}&{}{0.000}&{}{0.000}&{}{0.001}&{}{0.002}&{}{0.002}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.979}&{}{0.000}&{}{0.000}&{}{0.984}\\ {0.020}&{}{0.001}&{}{0.025}&{}{0.000}&{}{0.042}&{}{0.012}&{}{0.335}&{}{1.000}&{}{0.003}&{}{0.003}&{}{0.000}&{}{0.019}&{}{0.093}&{}{0.067}&{}{0.063}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}\\ {0.014}&{}{0.001}&{}{0.016}&{}{0.000}&{}{0.008}&{}{0.003}&{}{0.036}&{}{0.000}&{}{0.017}&{}{0.011}&{}{1.000}&{}{0.004}&{}{0.016}&{}{0.022}&{}{0.021}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}\\ {0.060}&{}{0.004}&{}{0.146}&{}{1.000}&{}{0.012}&{}{0.010}&{}{0.027}&{}{0.000}&{}{0.001}&{}{0.001}&{}{0.000}&{}{0.006}&{}{0.019}&{}{0.170}&{}{0.142}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}\\ {0.019}&{}{0.001}&{}{0.022}&{}{0.000}&{}{0.025}&{}{0.008}&{}{0.154}&{}{0.000}&{}{0.004}&{}{0.004}&{}{0.000}&{}{0.011}&{}{0.052}&{}{0.049}&{}{0.046}&{}{1.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}\\ {0.011}&{}{0.000}&{}{0.011}&{}{0.000}&{}{0.003}&{}{0.002}&{}{0.014}&{}{0.000}&{}{0.971}&{}{0.976}&{}{0.000}&{}{0.002}&{}{0.007}&{}{0.012}&{}{0.011}&{}{0.000}&{}{1.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.001} \end{array}} \right] .\end{aligned}$$

Step 11 Repeating Step 9 and Step 10 for \(t=300\) iterations, Phase 3 stops. At that time, we obtain the matrix \(\mu _{ic}^{(300)}\) as shown in Fig. 4.

$$\begin{aligned}{\mu ^{(300)}_{ic}} = \left[ {\begin{array}{rrrrrrrrrrrrrrrrrrrrrr} {0.000}&{}{0.002}&{}{0.002}&{}{0.000}&{}{0.000}&{}{0.004}&{}{0.000}&{}{0.011}&{}{0.993}&{}{0.981}&{}{0.307}&{}{0.000}&{}{0.001}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.999}&{}{0.000}&{}{0.006}&{}{0.005}&{}{0.006}&{}{0.015}\\ {0.000}&{}{0.006}&{}{0.008}&{}{0.000}&{}{0.997}&{}{0.427}&{}{0.008}&{}{0.067}&{}{0.001}&{}{0.002}&{}{0.074}&{}{0.979}&{}{0.860}&{}{0.006}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.002}&{}{0.002}&{}{0.002}&{}{0.005}\\ {0.000}&{}{0.001}&{}{0.001}&{}{0.000}&{}{0.000}&{}{0.001}&{}{0.000}&{}{0.001}&{}{0.000}&{}{0.001}&{}{0.008}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.936}&{}{0.959}&{}{0.955}&{}{0.836}\\ {0.000}&{}{0.009}&{}{0.011}&{}{0.000}&{}{0.001}&{}{0.428}&{}{0.001}&{}{0.023}&{}{0.001}&{}{0.002}&{}{0.043}&{}{0.006}&{}{0.015}&{}{0.988}&{}{0.999}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.002}&{}{0.002}&{}{0.002}&{}{0.004}\\ {0.000}&{}{0.001}&{}{0.001}&{}{0.000}&{}{0.000}&{}{0.001}&{}{0.000}&{}{0.002}&{}{0.001}&{}{0.003}&{}{0.020}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{1.000}&{}{0.044}&{}{0.023}&{}{0.025}&{}{0.119}\\ {0.000}&{}{0.004}&{}{0.004}&{}{0.000}&{}{0.000}&{}{0.019}&{}{0.001}&{}{0.697}&{}{0.002}&{}{0.006}&{}{0.401}&{}{0.001}&{}{0.007}&{}{0.001}&{}{0.000}&{}{1.000}&{}{0.000}&{}{0.000}&{}{0.003}&{}{0.003}&{}{0.003}&{}{0.007}\\ {0.000}&{}{0.005}&{}{0.006}&{}{0.000}&{}{0.002}&{}{0.093}&{}{0.989}&{}{0.186}&{}{0.001}&{}{0.003}&{}{0.108}&{}{0.012}&{}{0.114}&{}{0.002}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.003}&{}{0.002}&{}{0.003}&{}{0.006}\\ {0.000}&{}{0.868}&{}{0.910}&{}{0.000}&{}{0.000}&{}{0.003}&{}{0.000}&{}{0.002}&{}{0.000}&{}{0.001}&{}{0.009}&{}{0.000}&{}{0.001}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.001}&{}{0.001}&{}{0.001}&{}{0.002}\\ {0.000}&{}{0.017}&{}{0.028}&{}{1.000}&{}{0.000}&{}{0.023}&{}{0.000}&{}{0.008}&{}{0.000}&{}{0.001}&{}{0.023}&{}{0.001}&{}{0.003}&{}{0.003}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.002}&{}{0.001}&{}{0.002}&{}{0.003}\\ {1.000}&{}{0.087}&{}{0.029}&{}{0.000}&{}{0.000}&{}{0.002}&{}{0.000}&{}{0.001}&{}{0.000}&{}{0.000}&{}{0.006}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.000}&{}{0.001}&{}{0.001}&{}{0.001}&{}{0.002} \end{array}} \right] . \end{aligned}$$
Fig. 4
figure 4

Bar graph showing the relation between each element with 10 clusters

Step 12 Applying the formula

$$\begin{aligned}{Y_c} = \sum \limits _{i = 1}^{10} {{M^T}\mu _{ic}^{(300)}} ,1 \le c \le 22,\end{aligned}$$

where

$$\begin{aligned} M =\,& {} {M^{300}}.\mathrm{max}\{ {T}\} /10 \\=\,& {} \{ 15460;15984;16859;13867;14696;13563;13055;16388;18970;15145\}, \end{aligned}$$

we obtain Table 4 and is illustrated in Fig. 5.

Table 4 Forecasting result of the recommended model
Fig. 5
figure 5

Line graph of original and forecasting data

Computing the parameters to evaluate the model, we have

$$\begin{aligned} MSE = 14087, MAE = 94.90, MAPE =0.57. \end{aligned}$$

Figure 5 shows that the actual values are almost identical with the forecasted ones. It means that the proposed model is very suitable to forecast this series time.

3.2 Comparison with the popular models

This section compares the new model with the existing ones by the well-known data sets. The considered series are EnrrollmentUA [13], Taifex (Taiwan Stock Exchange) [15], Outpatient [23], and Foodgrain [25], and the compared models are Abbasov–Manedova [1] (AM), [34] (L-C), [27] (Hua), [23] (B-R), [41] (Si), [52] (Y-H), [25] (Gh), [13] (Chen), [15] (C-K), [16] (C-H), [32] (Kha), [53] (Yus), [21] (Egr), and Tai [46]. In each datum, we consider two cases:

Case 1:

All of the series are used to build the models and to evaluate them by the parameters MAE, MAPE, and MSE.

Case 2:

Eighty percent of each series is taken as the training set to build the ARIMA (autoregressive integrated and moving average), AM (Abbasov–Manedova), and IFTS [46] models (the others only interpolate) and about 20% of the remaining series is used as the test set. The effectiveness of the model is also evaluated by the MAE, MAPE, and MSE.

  • For Case 1, we obtain the results in Table 5:

    Table 5 Parameters of models for the training sets

    Table 5 shows that the MAE, MAPE, and MSE parameters of the new model are always smaller than the compared existing models for all data sets. The parameters have shown the outstanding advantages in comparing the proposed model to others. For example, the value of MAPE of the proposed model for EnrollmentUA, Taifex, Outpatient, and Foodgrain data sets is 0.57, 0.1, 0.47, and 2.06,  respectively, while the others have MAPE \(\in [1.02; 3.08]\) for EnrollmentUA, MAPE \(\in [0.16, 1.42]\) for Taifex, MAPE \(\in [1.09, 24.45]\) for Outpatient, and MAPE \(\in [4.53, 10.13]\) for Foodgrain. We also obtain the similar results for the MSE and MAE parameters.

  • For Case 2, their results are given in Table 6 (ARIMAR, AMR, and IFTSR are the ARIMA, IFTS, and Abassov models with the original data set. ARIMAP, IFTSP, and AMP are the ARIMA, IFTS, and Abassov models with a training set from the proposed algorithm).

    Table 6 Parameters of models for the test sets

From Table 6, we also obtain the smallest value of the MAE, MAPE, and MSE for the proposed model on comparing with other ones.

To sum up, Tables 5 and 6 show that the proposed model has the best result in both interpolating and forecasting for all considered data sets. This shows the stability and the advantages of the new model. With a lot of considered models, this comparison is very meaningful in evaluating the advantages of the proposed model. In our opinion, the following are the reasons for this result. First, Phase 1 of the proposed model is an automatic algorithm that divides the series into groups with the appropriate numbers based on how similar they are in the series, while the other algorithms give the number of groups according to the experience or by language level. Second, Phase 2 only stops until the IDB index is optimized, while others base on distance criterion and do not consider parameters to evaluate the built algorithm. Finally, the relationship between each element in series and the divided groups is established appropriately. This relationship is built based on the fuzzy clustering algorithm, while others are often established by a specific expression.

3.3 Comparison with M3-Competition data

The third competition data called the M3-Competition were expanded from the M1-Competition data and M2-Competition data. It was built by [44]. This is very well known in series time which is often used to compare the efficiency of models together. Data set has 3003 series with many different kinds, including yearly, quarterly, monthly, daily, and others. They also belong to the different areas such as micro, industry, macro, finance, demographics, and other. The specificity about this set is presented in a lot of documents such as [44, 46, 47].

For these data, according to [44], the important models need to compare are ForecastPro, ForecastX, Bj automatic, Autobox1, Autobox2, Autobox3, Hybrid, ETS, and AutoARIMA (https: robjhyndman.com/m3comparisons.R). In the two recent studies, the authors Tai [46], Tai and Nghiep [47] have shown their outstanding advantages in comparison with all the above models. Therefore, we only compare the proposed result with that in [46, 47]. Let E(MAPE), E(MASE), and E(SMAPE) be the average of MAPE, MASE, and SMAPE, respectively. The result is presented in Table 7.

Table 7 Value of E(MAPE), E(MASE), and E(SMAPE) for models

Table 7 shows that the new model is more advantages than the compared existing models. With the large number of considered series and the different features for the M3-competition data set, this comparison has shown the outstanding advantages of the new model in the existing ones.

4 A real application for COVID-19 victims in Vietnam

The COVID-19 pandemic is a global problem that most countries in the world are preventing. In the prevention of this pandemic, forecasting the number of victims is one of the important information because it is the base for preventable strategy of the governments. In this section, we use the proposed model to predict the number of COVID-19 victims in Vietnam. It is performed with the following steps:

  • The data are divided into two parts: 80% for the training set (97 dates) and 20% remaining for the test set (24 dates). Interpolating the training set by the proposed model, we have the results shown in Fig. 6. The proposed model has parameters MAE = 1.691; MAPE = 7.169; MSE = 6.288; SMAPE =0.483; and MASE = 0.475.

    Figure 6 shows that the forecasting and actual values are almost identical.

    Using the original and interpolated data from the training set to forecast for 24 dates by ARIMA, AM, SEDMFOA [17], and IFTS models, we obtain Table 8 and Fig. 7.

    Figure 7 and Table 8 show that the models built from the interpolated data by the proposed model are better than the models built from the original data. Among them, ARIMAP gives the best result with 3.54% of MAPE, 11.1% of MAE, and 182.92% of MSE. These results have outstanding advantages compared to other models. Therefore, we use this model to forecast for the future.

  • Interpolating all data by the proposed model, we have Fig. 8, with MAE = 1.89; MAPE = 2.38; MSE = 11.10; SMAPE =0.87; and MASE = 0.55.

    Using the data from Fig. 8, forecasting for the next several years by ARIMP model, we obtain Table 9.

Fig. 6
figure 6

Actual and forecasting values for the training set

Fig. 7
figure 7

Forecasting values for test set of models

Fig. 8
figure 8

Interpolating all data by the proposed model

Table 8 Comparison of the models of the test set
Table 9 Forecast for the COVID-19 victims for the next 15 days

Table 9 shows that in the next days, the number of COVID-19 victims in Vietnam is barely increased. This is suitable in reality in Vietnam.

5 Conclusion

This research has significant contributions for application of unsupervised learning in building the forecasting model for time series. From the automatic fuzzy genetic algorithm in clustering, we have some important improvements for the new model. They are the method to find the number of groups to divide the universal set, the algorithm to determine the probability to belong to the divided groups of each element in series, and the principle to interpolate the series from the above result. Implementing for 3007 series with very different numbers and characteristics, the proposed model has shown the stability and has given advantages in comparison with the existing ones via the parameters such as MAPE, MASE, and SMAPE.

A significant contribution of this study is the prediction of COVID-19 victims in Vietnam. Performance results show that the proposed model is good forecasts on this data set. By developing a predictive model that is entirely based on the relationship among the data in the series, we think that the proposed model can get relevant results in predicting COVID-19 victims in other countries. This research can contribute to the early warning of COVID-19 infection risk. This is also our next application direction in the near future.

For this study, we have also faced the problem of computing. Compared with other popular models, the calculation in the proposed model is more complicated. The proposed model has 12 steps that are divided into three phases and are set up in a model. As a result, the time cost of the proposed model is often more than others. In addition, in this study, we are only interested in the optimization of algorithm.