1 Introduction

In time series data analysis and forecasting, it includes the problems associated with prediction of daily temperature, short-range as well as long-range rainfall amount, daily stock index price, economic growth of a country, etc. The fuzzy logic has the capability to deal with uncertainties involved in time series events. Using the concept of fuzzy logic, Song and Chissom [1] introduced the first model in 1991 to deal with the uncertainty and imprecise knowledge contained in time series data. In their modeling approach, each of the TSVs is represented by the fuzzy linguistic variables, and modeled and simulated them together to obtain the predicted value. They referred their model as “fuzzy time series (FTS)”. Recently, various modifications are suggested by the researchers [2,3,4,5] to improve the predictive skill of one-factor time series data set.

In the FTS modeling approach, there are four significant factors, which predominantly impact on the performance of the FTS model [5], as: (a) selection of the effective length of intervals, (b) determination of the DM of each historical TSV, (c) inclusion of the high-order FLRs, and (d) defuzzification operation. Hence, the contribution of this work is fourfold, as: (a) First, for the selection of the effective length of intervals, an LP model has been formulated and integrated with the FTS model. Its main objective is to minimize the proximities between lower and upper bounds of the corresponding interval; (b) Second, the DM of each historical TSV is determined based on its involvement in the universe of discourse; (c) Third, this study employs the high-order FLRs (see Eq. (2)) to improve the performance of the model, because high-order FLRs consider more linguistic values in comparison to first-order FLRs [5]; and (d) Fourth, for the defuzzification operation, this study uses the corresponding DM of each TSV along with the mid-value of each corresponding interval.

Based on these improvements, this study presents two models. The first model is exclusively based on the concept of FTS. The second model is based on the integration of an LP model with the FTS model (i.e., first model). The main intent of this LP model is to optimize the proximities between intervals. To validate the proposed models, experiments are conducted with the TAIEX index data set [6].

Organization of the article is presented as follows. Various theories of the FTS modeling approach are discussed in Sect. 2. Basics of the LP and model formulation are also discussed in this section. Two proposed models are presented in Sect. 3. In Sect. 4, various empirical analyzes are discussed followed by conclusion in Sect. 5.

2 Preliminaries

In this section, basic concepts of the FTS and LP are briefly discussed.

2.1 FTS: Basic Definitions

Here, a few important definitions of the FTS are presented. In the FTS, each TSV is represented by the fuzzy linguistic variable.

Definition 1

(Fuzzy time series (FTS)) [1]. Let \(M(t)(t=\ldots ,0,1,2,\ldots ) \subseteq R\), and can be considered as the universe of discourse on which fuzzy sets \(\mu _i(t)(i=1, 2, \ldots )\) be defined. Let G(t) be a collection of \(\mu _i(t)(i=1, 2, \ldots )\). Then, G(t) is called a FTS on \(M(t)(t=\ldots ,0,1,2,\ldots )\).

Definition 2

(Fuzzy logical relationship (FLR)) [1]. Consider that \(G(t-1)={A}_i\) and \(G(t)={A}_j\), where G(t) is assumed to be caused by \(G(t-1)\). The relationship between G(t) and \(G(t-1)\) is termed as a FLR between \({A}_i\) and \({A}_j\), which is defined, as:

$$\begin{aligned} {A}_i \rightarrow {A}_j, \end{aligned}$$
(1)

where \({A}_i\) and \({A}_j\) are termed as left-hand side (LHS) and right-hand side (RHS) of the FLR “\(A_i \rightarrow A_j\)”, respectively.

Definition 3

(High-order FLR) [5]. In any FLR, if a G(t) is influenced by more than one events \(G(t-1), G(t-2), \ldots ,\) and \(G(t-n)\) \((n > 0)\), then such relationship is referred as high-order FLR. This can be represented, as:

$$\begin{aligned} G(t-n),\ldots ,G(t-2),G(t-1)\rightarrow G(t) \end{aligned}$$
(2)

2.2 Formulation of LP Model for the Proximity Problem

Let \(U=[L_B,U_B]\) be the universe of discourse, where \(L_B\) and \(U_B\) be its lower and upper bounds. The U is discretized into n-intervals of equal lengths, as: \(I_1=[l_1,u_1]\), \(I_2=[l_2,u_2]\), \(\ldots \), \(I_n=[l_n,u_n]\). Let \(M_1, M_2, \ldots , M_n\) be the centroids or mid-values of the corresponding intervals. A process of intervals optimization is depicted in Fig. 1. In this study, it is assumed that this process initiates from the initial lower bound (i.e., \(l_1\)), then goes to the initial upper bound (i.e., \(u_1\)), then moves to the second upper bound, third upper bound, and so on until the last upper bound is covered (i.e., \(u_2, u_3, \ldots , u_n\)). Based on this assumption, the objective function (OF) and constraints are defined [7], which is presented next.

Fig. 1.
figure 1

Process of interval optimization.

The LP model formulation. Let \(x_i=\) effective length of interval which is required to maintain the proximity (\(i=1,2,\ldots ,n\)).

The LP model. The OF is defined, as:

$$\begin{aligned} \text{ Min } \text{(total } \text{ lengths) } \text{ Z } = l_1x_1 + u_1x_2 + u_2x_3 + u_3x_4 + \cdots + u_nx_{n+1} \end{aligned}$$
(3)

subject to the constraints

$$\begin{aligned} l_1x_1 + u_1x_2&\ge M_1 \nonumber \\ l_2x_2 + u_2x_3&\ge M_2 \nonumber \\ l_3x_3 + u_3x_4&\ge M_3 \nonumber \\ \vdots \nonumber \\ l_nx_{n-1} + u_nx_n&\ge M_n \end{aligned}$$
(4)

and                                  \(x_1, x_2, \cdots , x_n \ge 0\).

In Eq. (3), the set of \(b_i=\{l_1,u_1,u_2,u_3,\ldots ,u_n\}\) are coefficients representing the per unit change of the decision variable \(x_i=\{x_1,x_2,x_3,\ldots ,x_n\}\), which is associated with the value of the OF. In Eq. (4), the set of \(a_{ij}=\{(l_1,u_1),(l_2,u_2),(l_3,u_3),\ldots ,(l_n,u_n)\}\) are referred as the input-output coefficients. These represent the boundaries of the intervals associated with the variable \(x_i\). These coefficients can be positive, negative or zero. The set of \(m_i=\{M_1,M_2,M_3,\ldots ,M_n\}\) are the total availability of the ith resource. Forecasting accuracy of the FTS modeling approach mainly depends on the selection of appropriate interval lengths. Therefore, to resolve this problem, an LP model is formulated using Eq. (3) to select the appropriate length of intervals. For this LP model, constraints are defined in Eq. (4).

To solve this LP model using the simplex method [7], it is required to convert the problem into its standard form. Therefore, for the minimization type of the OF (see Eq. (3)), it is required to convert it into maximization type, by using the relation presented, as follows:

$$\begin{aligned} \text{ Min } \text{(total } \text{ lengths) } \text{ Z } = - \text{ Max }~Z^* \end{aligned}$$
(5)

where \(Z^*=-Z\).

Again, all the constraints in Eq. (4) are of type “\(\ge \)”, so we should add m surplus variables (\(S_i\)) and subtract m artificial variables (\(A_i\)) in each constraint. Hence, the resulting constraints becomes:

$$\begin{aligned} \sum _{i=1}^{n}l_ix_i + \sum _{i=2}^{n}l_ix_{i+1} - S_i + A_i = M_n \end{aligned}$$
(6)

where \(x_i,S_i,A_i \ge 0, i=1,2,\ldots ,m\).

Each slack variable (\(A_i\)) represents an unused resource, therefore, such variables are added to the OF with zero coefficients. Each surplus variable (\(S_i\)) is considered as the amount exceed values w.r.t. a particular resource. These variables are also termed as negative slack variables. Both surplus and slack variables carry a zero coefficient in the OF.

Now, the OF (see Eq. (3)) and the constraints (see Eq. (4)) can be converted into the standard form based on the Eqs. (5) and (6), as:

$$\begin{aligned} \text{ Min } \text{(total } \text{ lengths) } \text{ Z } = - l_1x_1 - u_1x_2 - u_2x_3 - u_3x_4 - \cdots - u_nx_n \end{aligned}$$
(7)

subject to the constraints

$$\begin{aligned} l_1x_1 + u_1x_2 - S_1 + A_1&= M_1 \nonumber \\ l_2x_2 + u_2x_3 - S_2 + A_2&= M_2 \nonumber \\ l_3x_3 + u_3x_4 - S_3 + A_3&= M_3 \nonumber \\ \vdots \nonumber \\ l_nx_{n-1} + u_nx_n - S_n + A_n&= M_n \end{aligned}$$
(8)

and \(x_1, x_2, \cdots , x_n \ge 0\); \(S_1, S_2, \cdots , S_n \ge 0\); \(A_1, A_2, \cdots , A_n \ge 0\).

Detail descriptions to solve this LP model using simplex method can be found in [7].

3 Proposed Models

This section introduces two different models. In the first phase, the existing Chen’s model [8] is modified, and try to obtain the predictive values. This initial model is termed as High-Order FTS Model (HOFTSM). In the second phase, an LP model is formulated, and integrated with the HOFTSM to obtain the optimal interval lengths. This model is referred as High-Order FTS-LP Model (HOFTS-LPM).

Table 1. TAIEX index data set.
Table 2. Intervals along with their corresponding mid-values for the TAIEX index data set.

3.1 High-Order FTS Model (HOFTSM)

The HOFTSM is simulated using the historical time series data set of the TAIEX index [6] (see Table 1). The functionality of each phase of the model is presented next.

Table 3. Fuzzified TAIEX index data set and their corresponding DMs.
Table 4. Fourth-order FLRs for the TAIEX index data set.
  1. Step 1.

    Provide the boundary of the historical time series data set by defining the universe of discourse U , as: \(U=[A_{min}-M_1,A_{max}+M_2]\) , where \(A_{min}\) and \(A_{max}\) be the minimum and maximum values of the historical time series data set. Here, \(M_1\) and \(M_2\) are two positive numbers. In Table 1, it is observed that \(A_{min}=3327.70\) and \(A_{max}=3776.60\). Therefore, initially, it is considered that \(M_1=2\) and \(M_2=5\). Hence, in this study, the universe of discourse is, as: U = [3325.70, 3781.60].

  2. Step 2.

    Descretize the universe of discourse U into n-intervals of equal lengths based on Eq. ( 9 ), as:

    $$\begin{aligned} a_i=[L_B+(i-1)\frac{U_B-L_B}{j},L_B+i\frac{U_B-L_B}{j}] \end{aligned}$$
    (9)

    for \(i=1,2,\ldots ,n\) , and j represents the number of intervals which are considered during the simulation. Here, \(L_B\) = 3325.70, \(U_B\) = 3781.60, and \(j=12\). In this study, simulation is initiated with maximum 12 intervals, because more than 12 intervals can convert the whole sample into the crisp value, which would be the violation of the FTS modeling approach. All these intervals, their corresponding data, and mid-values are listed in Table 2.

  3. Step 3.

    Define fuzzy linguistic variable \(A_i\) , for each of the defined intervals. For this purpose, 12 fuzzy linguistic variables are defined, as: \(A_1\) (very low), \(A_2\) (not very low), \(\ldots \), \(A_{12}\) (very very high), on the U, for the historical time series data set of the TAIEX index, because total 12 intervals are defined.

  4. Step 4.

    Obtain the DM for each historical TSV on the U , based on the triangular membership function. In this step, the DM of each historical TSV is determined using the triangular membership function. This function can be defined by the following equation, as [9]:

    $$\begin{aligned} f(X_i;L_B,U_B)=\frac{X_i-L_B}{U_B-L_B},~~L_B \le X_i \le U_B \end{aligned}$$
    (10)

    Here, each input vector \(X_i\) is represented by the historical TSV corresponding to each day.

  5. Step 5.

    Fuzzify each of the historical TSVs. The fuzzified TSVs, their corresponding DMs and mid-values are listed in Table 3.

  6. Step 6.

    Obtain the high-order FLRs (based on Eq. ( 2 )). Based on Eq. (2), the fourth-order FLRs are established between the fuzzified TSVs. For example, in Table 3, the fuzzified TSVs for days 1/12/1992, 2/12/1992, 3/12/1992, 4/12/1992, and 5/12/1992 are \(A_{9}\), \(A_{9}\), \(A_{8}\), \(A_{9}\), and \(A_{11}\), respectively. Here, to establish the fourth-order FLR among these fuzzified TSVs, it is considered that \(A_{11}\) is caused by the previous four fuzzified TSVs \(A_{9}\), \(A_{9}\), \(A_{8}\), and \(A_{9}\). Hence, the fourth-order FLR is represented in the following form:

    $$\begin{aligned} A_{9}, A_{9}, A_{8}, A_{9} \rightarrow A_{11} \end{aligned}$$
    (11)

    Remaining fourth-order FLRs are obtained in the manner, and depicted in Table 4. In this table, each symbol “?” represents the desired output for corresponding day “t” in the symbol “\(\langle \rangle \)”, which would be determined by the proposed model.

  7. Step 7.

    Defuzzify the historical TSVs, and obtain the forecasted values, as:

    • Initially, obtain the nth-order FLR for forecasting the G(t), as:

      $$\begin{aligned} A_{tn} , A_{t(n-1)},\ldots , A_{t1} \rightarrow ?\langle t \rangle , \end{aligned}$$
      (12)

      where “t” represents a day, which we want to obtain the forecasted value, and “n” is the order of FLR (\(n \ge 4\)). Here, \(A_{tn}, A_{t(n-1)},\ldots ,\) and \(A_{t1}\) are the previous state’s fuzzified TSVs from days, \(G(t-n), \ldots , G(t-2)\) to \(G(t-1)\).

    • Find the intervals that are associated with fuzzy linguistic variables \(A_{tn}, A_{t(n-1)},\ldots ,\) and \(A_{t1}\), and let these intervals be \(a_n, a_{n-1}, \ldots , a_1\), respectively. Consider that these intervals have the corresponding mid-points, as: \(P_n, P_{n-1}, \ldots , P_1\).

    • Replace each of the previous state’s fuzzified TSVs of Eq. (12) with their corresponding mid-points, as:

      $$\begin{aligned} P_{n},P_{n-1},\ldots ,P_{1} \rightarrow ?\langle t \rangle , n \ge 4 \end{aligned}$$
      (13)
    • Get the DM of historical TSV corresponding to each fuzzy linguistic variable involved in Eq. (12), as:

      $$\begin{aligned} D_{n},D_{n-1},\ldots ,D_{1} \rightarrow ?\langle t \rangle , n \ge 4 \end{aligned}$$
      (14)
    • Use the following formula to compute the desired output “?” for the corresponding day “t”, as:

      $$\begin{aligned} Forecast(t) = \frac{\sum \limits _{i=1}^{N}P_iD_i}{\sum {D_i}} \end{aligned}$$
      (15)

      Here, N is the total number of mid-points (\(P_i\)) to be used, and each \(D_i\) represents the DM of the TSV corresponding to each fuzzy linguistic variable.

3.2 High-Order FTS-LP Model (HOFTS-LPM)

To make the proposed HOFTSM more efficient, an LP model is formulated, and integrated with it. The main intent of this LP model is to select the appropriate interval lengths by minimizing the proximities between lower and upper bounds of the intervals (Tables 5 and 6).

  1. Step 1.

    Repeat Steps 1–7 of the HOFTSM (presented in Subsect. 3.1 ).

  2. Step 2.

    Define the OF and constraints based on Eqs. ( 3 ) and ( 4 ), respectively. In the HOFTSM, the universe of discourse, U = [3325.70,3781.60], is partitioned into 12 equal length of intervals, as: \(a_1\) = [3325.70,3363.69], \(a_2\) = [3363.69,3401.68], \(\ldots \), \(a_{12}\) = [3743.61, 3781.60]. Here, each \(a_i\) can be represented, as: \(a_i = [l_i,u_i]\), where each \(l_i\) and \(u_i\) represent the lower and upper bounds of an interval. Now, based on these lower and upper bounds, an LP model can be represented, as:

    The LP model formulation. Let \(x_i=\) effective length of interval which is required to maintain the proximity (\(i=1,2,\ldots ,n\)).

    The LP model. The OF is defined, as:

    $$\begin{aligned} \text{ Min } \text{(total } \text{ lengths) } \text{ Z } = 3325.70x_1 + 3363.69x_2 + 3401.68x_3 + \cdots + 3781.60x_{13} \end{aligned}$$
    (16)

    subject to the constraints

    $$\begin{aligned} 3325.70x_1 + 3363.69x_2&\ge 3344.70 \nonumber \\ 3363.69x_2 + 3401.68x_3&\ge 3382.69 \nonumber \\ 3401.68x_3 + 3439.68x_4&\ge 3420.68 \nonumber \\ \vdots \nonumber \\ 3743.61x_{12} + 3781.60x_{13}&\ge 3762.60 \end{aligned}$$
    (17)

    and                                  \(x_1, x_2, \cdots , x_{13} \ge 0\)

  3. Step 3.

    Obtain the solution of the LP model in terms of \(x_i\) , as defined in Step 2, based on the simplex method.

  4. Step 4.

    Compute the proximities for each of the intervals, as:

    $$\begin{aligned} x_i(new) = x_i(old) + Rand(-c_v,c_v) \end{aligned}$$
    (18)

    Here, Rand is a random function that gives the random value in the range of \([-c_v,c_v]\), where \(c_v\) is a user’s defined constant value.

  5. Step 5.

    Update the set of intervals, as:

    $$\begin{aligned} a_1(new)= & {} [l_1(old)+x_1(new),u_1(old)+x_2(new)] \nonumber \\ a_2(new)= & {} [l_2(old),u_2(old)+x_3(new)] \nonumber \\ \vdots \nonumber \\ a_n(new)= & {} [l_n(old),u_n(old)+x_n(new)] \end{aligned}$$
    (19)
  6. Step 6.

    Repeat Steps 1–5 until the optimal solution is found.

Table 5. A sample of intervals produced by the HOFTS-LPM for the TAIEX index data set.
Table 6. Forecasting results of TAIEX index data set for 5 different iterations using HOFTS-LPM (based on 4th-order FLRs).

4 Empirical Analyzes

The performance of the two proposed models is evaluated using two parameters, namely root mean square error (RMSE) and average forecasting error rate (AFER) [10]. Performance of the two proposed models are compared together based on the forecasting results, obtained for the TAIEX index data set. During the simulation process, 12 intervals are used. Experimental results are obtained with 4th-order to 7th-order of FLRs. Comparison results are presented in Table 7, in terms of the average of RMSEs. From Table 7, it is obvious that the proposed HOFTS-LPM outperforms the HOFTSM.

Table 7. Performance analysis of the proposed models (in terms of Average RMSE) for different orders of FLRs (with number of intervals = 12).
Table 8. Comparison of the proposed HOFTS-LPM with existing FTS models.

Forecasting accuracy of the proposed HOFTS-LPM is compared with the existing FTS models [6, 8, 11,12,13]. In this comparison, the forecasted values for the TAIEX index data set are obtained with 12 intervals. During this simulation process, the forecasted values for the proposed HOFTS-LPM are obtained using the 4th-order FLRs. Comparison results are presented in Table 8. The smaller values of RMSE and AFER for the proposed HOFTS-LPM show that its forecasting accuracy is far better than considered competing models.

5 Conclusion

In this study, two models are proposed to improve the predictive skill of one-factor time series data set. The initial model is termed as the HOFTSM. This model is the modification of the Chen’s model [8]. In this model, initially equal-sized of intervals are used to fuzzify the historical time series data set. Simulation of this model is performed using the high-order FLRs. However, in the searching for more optimal results, this study further suggests the integration of the LP model with the HOFTSM. This model is referred as the HOFTS-LPM. In the HOFTS-LPM, solutions of the integrated LP model is obtained using the simplex method. The proposed two models are verified and validated with the historical time series data set of the TAIEX index. The empirical analyzes show that the predictive skill of the HOFTS-LPM is more robust than the HOFTSM.