Introduction

The accurate inventory reconciliation model plays an important role in improving the capacity of fault diagnosis and reducing the misdiagnosis rate as well as omissive judgement rate of fault diagnosis during storage and transportation process (Chen et al. 2010; United States Environmental Protection Agency 1995). The error of inventory reconciliation is related to source measurements, terminal measurements and process loss. It obeys the normal distribution with zero mean in theory. Different measurement systems have unlike instruments based on various principles. Unlike instruments have different accuracy, nonlinear, zero drift and other characteristics. These characteristics are affected by internal and external environment and so on. Under the influence of different measurement methods between sources and terminals as well as the impact of process technology, process parameters, process loss and environmental changes, the error of inventory reconciliation obeys the normal distribution that the mean is not zero actually. It is difficult to establish a precise error prediction model of inventory reconciliation using first principles.

In recent years, machine learning methods represented by neural network and support vector machine have been widely used in the aspects of instruments error modeling, forecasting or other areas (Austina et al. 2013; Wang et al. 2015). He et al. (2014) used the GM (1, N) system optimized by neural network to correct the nonlinear error of sensor. And the corrected sensor has the desired input and output characteristics. Peng et al. (2013) made use of the BP neural network optimized by genetic algorithm to solve the problem of sensor temperature compensation. The performance of fluxgate magnetometers was improved by using RBF neural network to establish a compensation model for the bias and scale factors (Pang et al. 2012). Ye et al. (2016) and Zhang et al. (2012) used the LSSVM to predict and compensate the temperature error of instruments. And they carried a useful exploration of parameters optimization for LSSVM. But these papers were just about a single-sensor error compensation or estimation, and they did not involve in error between different metering systems. How to establish the error prediction model by making full use of the collected and stored process data has become the important research content of storage and transportation process.

Aiming at the error of inventory reconciliation caused by different metering systems, the error prediction method of inventory reconciliation during storage and transportation process based on PLS and MFOA-LSSVM is proposed in the paper. This method uses PLS to achieve the goal of key factors feature extraction for inventory reconciliation model equations firstly. Then, in order to avoid falling into local optimum, LSSVM optimized by MFOA is adopted for modeling. Finally, the validity of the method is verified by the experiment of oil storage and transportation process on the advanced process control experimental platform. In order to simplify the calculation, the error prediction model is established when the pump frequency is 42.5 Hz.

The basic principles of prediction method

The flow of prediction method

The general error prediction flow of inventory reconciliation is provided in Fig. 1:

Fig. 1
figure 1

Error prediction flow of inventory reconciliation during storage and transportation process based on PLS and MFOA-LSSVM

First of all, analyze the source and terminal metering system. Establish the formula of inventory reconciliation error under the dynamic steady condition during storage and transportation process. Collect the experimental data of key factors. Then, eliminate gross error in data by using \(3\sigma\) rule. Normalize it subsequently, so that the trained LSSVM model immune to gross error and different dimensions. Then, extract principal components of the independent variables which relevant to dependent variables by PLS. At the same time, redundant or irrelevant data are eliminated and the amount of input data is reduced. Hereafter, optimize the model of LSSVM to find out the optimum value of parameters by using training data and MFOA. Establish the prediction model of LSSVM. Finally, input prediction model built before with the test data to obtain the prediction error.

The error of inventory reconciliation and influential factors

According to the source and terminal of storage and transportation process, the error of inventory reconciliation can be established as formula (1).

$$\begin{aligned} & V_{\text{e}} = \sum\limits_{i = 1}^{n} {V_{\text{Ti}} } + \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {V_{{{\text{loss}}ij}} } } - \sum\limits_{j = 1}^{m} {V_{{{\text{S}}j}} } \\ & V_{\text{e}} \sim N\left( {\mu ,\sigma^{2} } \right) \\ \end{aligned}$$
(1)

where \(V_{\text{e}}\) is the error of inventory reconciliation, \(V_{{{\text{loss}}ij}}\) is the process loss from the \(j\)th source to \(i\)th terminal, \(V_{{{\text{T}}i}}\) is the amount of media changed in the \(i\)th terminal, \(V_{{{\text{S}}j}}\) is the amount of media changed in the \(j\)th source, and n and m are the number of sources and terminals, respectively.

\(V_{\text{e}}\) obeys the normal distribution in which mean is \(\mu\). The value of \(\mu\) is depended on the measurement accuracy of \(V_{{{\text{loss}}ij}}\), \(V_{{{\text{T}}i}}\) and \(V_{Sj}\). The measurement accuracy of \(V_{{{\text{loss}}ij}}\) is connected with pipe diameter, length, friction, medium velocity and other factors. The terminal and the source can be measured by level gauges, pressure sensors and flow meters. The measurement accuracy of \(V_{{{\text{T}}i}}\) and \(V_{Sj}\) is affected by internal mechanical structure of metering instruments and external environment. It relates to medium temperature, ambient temperature, level, flow and other factors.

Feature extraction by PLS

Assume that influential factors (independent variables) are \(X \in R^{n \times p}\) and the inventory reconciliation error (dependent variable) is \(Y \in R^{n \times 1}\).

$$X = \left[ {\begin{array}{*{20}c} {x_{11} } & {x_{12} } & \cdots & {x_{1p} } \\ {x_{21} } & {x{}_{22}} & \cdots & {x_{2p} } \\ \vdots & \vdots & {} & \vdots \\ {x_{n1} } & {x_{n2} } & \cdots & {x_{np} } \\ \end{array} } \right],\quad Y = \left[ {\begin{array}{*{20}c} {y_{1} } \\ {y_{2} } \\ \vdots \\ {y_{n} } \\ \end{array} } \right]$$
(2)

where n is the number of samples and \(p\) is the number of influential factors.

In order to obtain the principle components \(t_{i}\) (\(i = 1,2, \ldots ,q\), \(q\) is the number of principle component) that can not only represent the independent variables, but also explain the dependent variable as much as possible, the nonlinear iterative algorithm is needed (You et al. 2013). Specific steps are as follows:

  • Step 1: Centralize \(X\) and \(Y\). \(E_{0}\) and \(F_{0}\) are the result of centralized \(X\) and \(Y\). Let \(i = 1\).

  • Step 2: Calculate the enter weight vector \(w_{i}\), score vector \(t_{i}\), load vector \(P_{i}\) and internal regression coefficients \(r_{i}\).

  • $$w_{i} = E_{i - 1}^{'} F_{i - 1} /\left\| {\left. {E_{i - 1} 'F_{i - 1} } \right\|} \right.$$
    (3)
  • $$t_{i} = E_{i - 1} w_{i}$$
    (4)
  • $$P_{i} = E_{i - 1}^{{}} t_{i} /\left\| {t_{i} } \right\|^{2}$$
    (5)
  • $$r_{i} = F_{i - 1} t_{i} /\left\| {t_{i} } \right\|^{2}$$
    (6)
  • Step 3: Let \(E_{i} = E_{i - 1} - t_{i} P_{i} '\) and \(F_{i} = F_{i - 1} - r_{i} t_{i}\).

  • Step 4: Calculate \({\text{press}}_{i}\), \({\text{ss}}_{i}\) and \(Q_{i}^{2}\) according to cross validation.

  • $${\text{press}}_{i} = \sum\limits_{z = 1}^{n} {\left( {y_{z} - \hat{y}_{i( - z)} } \right)^{2} }$$
    (7)
  • $${\text{ss}}_{i} = \sum\limits_{z = 1}^{n} {(y_{z} - \hat{y}_{iz} )^{2} }$$
    (8)
  • $$Q_{i}^{2} = 1 - {\text{press}}_{i} /{\text{ss}}_{i}$$
    (9)
  • where, \(\hat{y}_{i( - z)}\) is the estimate value of deleted point by regression model of \(i\) principle components

  • Step 5: The new score vector \(t_{i}\) can significantly improve the performance of the extracted components when \(Q_{i}^{2} \ge 0.0975\) (Abdi and Williams 2010). Analyze whether the inequality \(Q_{i}^{2} \ge 0.0975\) is established. If is, return to Step 2 to continue the calculations. Otherwise, output principle components \(t_{i}\) and the enter weight vector \(w_{i}\).

The MFOA-LSSVM prediction model of inventory reconciliation error

LSSVM changes the inequality constrains into equality constraints on the basis of SVM. It reduces the complexity of model and improves the performance of model (Miranian and Abdollahzade 2013; Mellit et al. 2013)

Training data are \((x_{i} ,y_{i} )_{n\, \times \,(p + 1)}\) where \(n\) is the number of samples and \(p\) is the dimension of input variables. Use LSSVM to solve the problem of data fitting, or classification is equivalent to settle optimization problems as shown in formula (10):

$$\left\{ {\begin{array}{*{20}l} {\hbox{min} \;J(\omega ,b,\varepsilon ) = \frac{1}{2}\omega^{T} \omega + \frac{C}{2}\sum\limits_{i = 1}^{n} {\varepsilon_{i}^{2} } } \hfill \\ {y = \omega^{T} \varphi \left( {x_{i} } \right) + b + \varepsilon_{i} \quad (i = 1,\,2,\, \ldots ,\,n)} \hfill \\ \end{array} } \right.$$
(10)

where \(C\) is the penalty factor,\(\omega\) is the weight vector,\(b\) is the deviation,\(\varepsilon_{i}\) is the error variable, and \(\varphi (x_{i} )\) is the mapping function.

Construct the function of Lagrange according to formula (10):

$$L(\omega ,b,\varepsilon ,\alpha ) = J(\omega ,b,\varepsilon ) - \omega^{T} \varphi (x_{i} ) + b + \varepsilon_{i}$$
(11)

where \(\alpha\) is the Lagrange multiplier and \(\alpha\) is equal to \([\alpha_{1} ,\alpha_{2} , \ldots ,\alpha_{n} ]^{T}\).

Calculate the function of Lagrange to obtain formula (12):

$$\left[ {\begin{array}{*{20}c} 0 & {1_{n \times 1}^{T} } \\ {1_{n \times 1} } & {\varOmega + \frac{{E_{n \times n} }}{C}} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} b \\ \alpha \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 0 \\ y \\ \end{array} } \right]$$
(12)

where \(1_{n \times 1}\) is a matrix of n rows which value is 1, \(E_{n \times n}\) is identity matrix of n orders, \(\varOmega_{ij} = \varphi \left( {x_{i} } \right)^{T} \varphi \left( {x_{j} } \right) = K\left( {x_{i} ,x_{j} } \right)\), and \(K\left( {x_{i} ,x_{j} } \right)\) is kernel function.

Calculate formula (12) to obtain the model of LSSVM:

$$y(x) = \sum\limits_{i = 1}^{n} {\alpha K\left( {x,x_{i} } \right)} + b$$
(13)

The common kernel functions include linear kernel function, polynomial kernel function, Gauss radial basis kernel function and sigmoid kernel function. What need to consider when selecting the kernel function are its ability to handle nonlinear and the number of undetermined parameters. In this paper, select Gauss radial basis kernel function, that is \(K\left( {x_{i} ,x_{j} } \right) = \exp \left[ { - \left( {x_{i} - x_{j} } \right)^{2} /2\sigma^{2} } \right]\).

In the process of modeling LSSVM, it is difficult to select the penalty factor and kernel parameter by artificial experience. In order to better predict the error, it needs a kind of optimization algorithm for parameters optimization of LSSVM. Fruit fly optimization algorithm is an optimization method based on the foraging behavior of fruit flies. It was proposed by Pan W. T. coming from Taiwan in 2012 (Pan 2012; Dai et al. 2014; Si et al. 2016). The modified fruit fly optimization algorithm with a small amount of calculation as well as a good global search capability is adopted to select the LSSVM model parameters for the reason that the fruit fly optimization algorithm is easy to fall into local optimum. Specific steps of three-dimensional improved fruit fly algorithm with diminishing steps are as follows.

  • Step 1: Parameters initialization: maximum number of iterations (maxgen), population size (popsize), maximum step (\(L_{\hbox{max} }\)), minimum step (\(L_{\hbox{min} }\)), and the initial position of fruit flies \(X(i)\), \(Y(i)\) and \(Z(i)\), as well as the best location \(X\_{\text{axis}}\), \(Y\_{\text{axis}}\) and \({\text{Z}}\_{\text{axis}}\). Let the iteration \({\text{gen}} = 1\).

  • Step 2: Set the directions and distances of foraging for fruit flies.

  • $$\begin{array}{*{20}l} {X(i,:) = X\_{\text{axis}} + L*{\text{rands}}(1,2)} \hfill \\ {Y(i,:) = Y\_{\text{axis}} + L*{\text{rands}}(1,2)} \hfill \\ {Z(i,:) = Z\_{\text{axis}} + L*{\text{rands}}(1,2)} \hfill \\ {L = L_{\hbox{max} } - \frac{{(L_{\hbox{max} } - L_{\hbox{min} } )*{\text{gen}}^{2} }}{{\hbox{max} {\text{gen}}^{2} }}} \hfill \\ \end{array}$$
    (14)
  • where \(L\) is the steps of fruit flies.

  • Step 3: Calculate the concentration value \(S(i,1)\) and \(S(i,2)\) of fruit fly individuals.

  • $$\begin{aligned} S(i,1) = \frac{1}{{\sqrt {X(i,1)^{2} + Y(i,1)^{2} + Z(i,1)^{2} } }} \hfill \\ S(i,2) = \frac{1}{{\sqrt {X(i,2)^{2} + Y(i,2)^{2} + Z(i,2)^{2} } }} \hfill \\ \end{aligned}$$
    (15)
  • Step 4: Let \(C = S(i,1)\), \(\sigma^{2} = S(i,2)\) in LSSVM. Train LSSVM through using the training data. Let the mean square error (MSE) of test samples equals to concentration function of taste. That is \({\text{smell}}(i) = {\text{MSE}}(i)\).

  • Step 5: Calculate the positions of minimum concentration value: \([{\text{bestsmell}},{\text{index}}] = \hbox{min} ({\text{smell}})\). Let \(X\_{\text{axis}} = x({\text{index}})\), \(Y\_{\text{axis}} = y({\text{index}})\).

  • Step 6: Let \({\text{gen}} = {\text{gen}} + 1\) and repeat steps 2–5 until meeting the maximum number of iterations. Output the locations of optimum concentration and the model of LSSVM.

Experimental modeling of inventory reconciliation error prediction

Experimental preparation

The experiments are carried out on the type of THJ-4 advanced process control system platform (Fig. 2) to simulate transmission oil operations with water instead of oil. Detectors used by experimental platform are diffused silicon pressure transmitter, Pt100 temperature sensor and a turbine flow meter. MCGS configuration software is used by monitoring system, which is shown in Fig. 3.

Fig. 2
figure 2

Simulation platform

Fig. 3
figure 3

Monitoring system of storage and transportation process

The calculation of inventory reconciliation error

Water was transported from the storage tank to the medium tank via the pump and flow meter in the experiment. Inventory reconciliation equation shown in formula (16) can be established in the dynamic stability condition.

$$V_{\text{e}} = V_{\text{in}} + V_{\text{loss}} - V_{\text{out}}$$
(16)

where \(V_{\text{in}}\) is the amount of volume changed in the medium tank, \(V_{\text{loss}}\) is the loss media volume, and \(V_{\text{out}}\) is the volume via media flow meter. The error V e is calculated over a certain time interval of \(\Delta t\) (\(V_{\text{loss}}\) can be neglected this moment):

$$V_{e} = \frac{{\pi D^{2} }}{4}\left( {h_{2} - h_{1} } \right) - \int {vdt}$$
(17)

where \(D\) is the diameter of the medium tank, \(h_{1}\) is the water level at the time of \(t\), \(h_{2}\) is the water level at the time of \(t + \Delta t\), and \(v\) is the value of flow meter.

As can be seen from Eq. (17), factors that affect the error of inventory reconciliation model include flow error, level error and integration time. In addition, the medium temperature, ambient temperature and pump frequency will influence the error by affecting factors \(v\) and \(h\).

Data preparation

In order to meet the need of status monitoring for storage and transportation process, let \(\Delta t = 1\,s\). The operating frequency of pump was taken as 42.5, 45 and 47.5 Hz, respectively. The external temperature changed among 20–25 °C, and the medium temperature changed among 25–30 °C. Measurements were repeated at different levels. The pump pressure, medium temperature, media velocity, pump frequency, the water level of medium tank and the ambient temperature were selected as model inputs x 1x 6. The calculated error was taken as model output. A total of 25,907 sets of data are collected. The number sets of data are 9167, 8803 and 7937 corresponding to 42.5, 45 and 47.5 Hz of the pump.

The average error results of three different frequencies when the ambient temperature is 20 °C, the medium temperature is 30 °C, and the level ranges from 2 to 14 cm are shown in Fig. 4.

Fig. 4
figure 4

Error of reconciliation model under different levels and frequency

At the frequency of 42.5 Hz, the average error results for different ambient temperature and medium temperature when the level ranges from 2 to 4 cm are shown in Fig. 5.

Fig. 5
figure 5

Error of reconciliation model under different external temperature and medium temperature

As can be seen from Fig. 4, the error varies with different level at the same frequency. At the same level, there are big differences for the model error between different frequencies. These results show that the error of inventory reconciliation model really exists and is related to pump frequency, media velocity and the media level. As shown in Fig. 5, the error varies with different medium temperature and ambient temperature too.

Data preprocessing

Gross error is eliminated in data according to \(3\sigma\) rule. Numbers of gross error are eleven, ten and thirty-three when the pump frequency are 42.5, 45 and 47.5 Hz. Excluded gross error is shown in Table 1, and the comparison of before and after gross error elimination is shown in Fig. 6.

Table 1 Gross error
Fig. 6
figure 6

Comparison of before and after gross error elimination

Feature extraction

The correlation coefficient between the input variables is calculated. As shown in Table 2, there is a high linear correlation between some input variables such as: \(x_{1}\) and \(x_{3}\), \(x_{3}\) and \(x_{4}\), \(x_{1}\) and \(x_{4}\).

Table 2 Correlation coefficient of input variables

Standardize the data and then adopt the PLS to extract the principal components.

The first principal component \(t_{1}\):

$$t_{1} = \left[ { - 0.56\;\; - 0.03\;\; - 0.53\;\; - 0.59\;\; - 0.18\;\; - 0.05} \right]X'$$
(18)

Because \(Q_{1}^{2} = 1 \ge 0.0975\), extraction of the second principal component \(t_{2}\) is continued.

$$t_{2} = \left[ { - 0.005\;\; - 0.23\;\;0.38\;\; - 0.06\;\; - 0.89\;\; - 0.10} \right]X'$$
(19)

Because \(Q_{1}^{2} = 0. 0 1 0 3\le 0.0975\), extraction of the principal component is stopped.

Where \(X' = \left[ {x_{1} '\;\,x_{2} '\;, \ldots ,\;\,x_{6} '} \right]^{T}\) and \(x_{i} '(i = 1,2, \ldots ,6)\) is the input variables after standardization process.

Prediction model

To reduce the amount of calculation, the error prediction model was only established when the pump frequency was 42.5 Hz. According to the general method of splitting the dataset, the 80/20 ratio was adopted to take the 1831 samples extracted from the principal components data at intervals of 5 as prediction data. In the course of training model, the parameters \(C\) and \(\sigma^{2}\) of LSSVM are optimized by using MFOA. These two corresponding parameters value were obtained (\(C = 0. 0 0 7 3\), \(\sigma^{2} = 0. 0 0 9 6\)) after iterating 100 times. The training process is shown in Figs. 7 and 8.

Fig. 7
figure 7

Optimization process of modified fruit fly algorithm

Fig. 8
figure 8

Flight line of optimal fruit fly

Comparison of methods

The method detailed in this paper is taken as the base method for comparison with other methods. From hereon, the base method in this paper will be referred to as method one. In order to verify the effectiveness of the method one, it also selected method two: least squares support vector machine optimized by fruit fly algorithm (FOA-LSSVM), method three: partial least squares regression and least squares support vector machine optimized by fruit fly algorithm (PLS and FOA-LSSVM) and method four: least squares support vector machine optimized by particle swarm optimization algorithm (PSO-LSSVM) to compare with it (Wang et al. 2012; Sedighizadeh and Kashani 2014).

As shown in Table 3, related parameters were set. The above four methods were used to predict the balance model error and correct the model with it. The results are shown in Figs. 9 and 10.

Table 3 Parameters setting of different methods
Fig. 9
figure 9

Comparison of training results for different methods. a Training result of model error, b corrected result of model error after training

Fig. 10
figure 10

Comparison of prediction results for different methods. a Prediction result of model error, b corrected result of model error after prediction

In order to clearly understand the difference between various methods, the paper also compares these methods in aspects of the root mean square error (RMSE), the mean absolute error (MAE), the relative mean absolute error (E ave), simulation time and the distribution of absolute relative error. The results are shown in Table 4 and Fig. 11.

Table 4 Performance comparison of four methods
Fig. 11
figure 11

Distribution of prediction relative errors for four methods

The predictive value of these four methods may be used to correct model error. Through the above comparison, these four methods can eliminate the systematic error influence on inventory reconciliation model to some extent, but the performance of various methods is somewhat different. Overall, the model of LSSVM optimized by fruit fly algorithm has a small error with respect to particle swarm optimization. Because of excluding the effects of noise and resolving the multicollinearity among input variables, the method with feature extraction by using partial least squares has some features compared to other methods, such as the less time spent and higher accuracy. The method of PSO-LSSVM consumed longest time with the biggest errors in predicting the error of inventory reconciliation model. The PLS and MFOA-LSSVM method used herein not only shorten the simulation time and improve the accuracy, but also their relative error distribution is better than other methods. It improved 1.48, 0.93 and 2.91% than other three methods in the aspects of RMSE, MAE and E ave, respectively. The maximum value of modeling time saved by it could reach up to 17.6%.

Conclusion

On the basis of instrument error prediction methods, use modular least squares support vector machines to predict the error of inventory reconciliation and eliminate it subsequently. On the one hand, the feature extraction has been implemented for the independent variables by partial least squares regression. On the other hand, the problems of large computation and low accuracy for error prediction have been solved by using the modified fruit fly algorithm to optimize parameters of least squares support vector machine. Compared with other three different prediction methods, experiments show that the method of PLS and MFOA-LSSVM can predict the error of inventory reconciliation model effectively.