1 Introduction

Three-phase induction motors (IMs) are the main industrial work horses that consume both active and reactive power [1, 2]. IMs are known inductive loads that produce a high power quality problem in electrical systems [3]. Inductive loads consume more reactive power. Consuming more reactive power in loads such as IMs creates energy loss and voltage drop in the electrical systems [4]. To reduce the energy loss and voltage drop, PF of IMs which is the ratio of active and reactive power must be maintained toward unity [5, 6]. In IMs, the active and reactive power varies while the motor load changes from no-load to full-load that consequently PF will be changed [7]. Bimbahara [8] described the reason that at no-load condition there is no mechanical resistance, the only magnetization reactance and motor resistance losses are presented. The stator current will be divided in two components active and reactive current in order to supply both mechanical resistance and magnetization reactance. Since the resistance losses (friction and windage loss) are quit small, only a few active current pass through the resistance losses, but the majority of stator current as a reactive current flow in magnetization reactance and therefore the no-load current creates high-angle legs the stator voltage in the range of 75°–85° [9]. The stator PF at no-load will be approximately between 0.1 and 0.3. However, as mechanical resistance increases, active current or power increases gradually and flow in rotor side to supply mechanical resistance. Then, it decreases the angle and improves the power factor in stator side about 0.8–0.9 [9].

In industrial factories, many motor loads are changing or even sometimes are working at light-load that causes low PF. The low PF in majority of IMs consuming more current is stored in the windage magnetic field and regenerated back to the grid line at each AC cycle [10, 11]. This exchange is known reactive current which can be a cost factor. To reduce this cost, generating reactive power is required in order to improve low PF [12]. Capacitors bank is one of the significant solutions to generate reactive power and correct the low PF. Obtaining the optimum value of required reactive power still is an unresolved challenge because in many cases the improper capacitors bank creates under- and over-correction in which under-correction produces low PF and over-correction causes self-excitation in IMs [13]. Presence of PF at any loading condition can obtain the proper size of capacitors. The PF can be determined by the equivalent circuit of IM that presents rotor and stator parameters. However, the equivalent circuit parameters need no-load test and lock rotor test that create a difficulty [2].

Many papers presented methods to determine the parameters using measured data and other available information from manufacturer data. For instance, Pedra [14], Haque [15] and Marcondes and Guimaraes [16] presented determination of IM parameters from manufacturer data. Estimation of IM parameters with genetic algorithm is reported by Phumiphak and Chat-uthai [17]. However, by using the equivalent circuit method the value of PF can be obtained only at no-load and full-load. In this method, to determine the PF at any loads, the slip or rotor speed requires that providing these parameters can create a difficulty in measurement [18].

Ukil presented a method using measured current and manufacturer data (MCMD) to determine the PF of a small IM [9]. It also used the voltage and motor current measurements with zero crossing method and instantaneous power method to obtain the PF at any loads. In this method, the indicated results showed poor results particularly at large IMs due to variation of reactive current. A normal meter device creates a difficulty to measure the PF at every single loading points due to numerical fluctuation. Power analyzer can be used to measure and record the PF at every loading points to resolve the reading issues. However, power analyzer not only is an expensive device, but also is required the motor being switch off for cable connection [19].

Therefore, estimation techniques would be an economical solution to predict the PF at every loading point. Hence, these techniques provide an online monitoring in order to enhance reliability and security of power quality in electrical systems. In this research, several statistical methods with two categories are used. Kriging and polynomial regression as numerical techniques, artificial neural network (ANN) and support vector regression (SVR) as intelligent techniques are used to estimate the PF of IM 100HP from no-load to full/over-load conditions. These methods required input data. The input values can be taken either from motor datasheet or by a few measurement points of voltage, current and input power from no-load to full-load conditions. In this paper, the estimated PF is compared with measured PF by simulation and practical work as well.

2 Case studies

In this study, a three-phase IM with range of 100HP is considered from a stone cutting machine. The measurement procedure took place when the operator moves the blade for cutting the stone by variable volume gradually. A power analyzer named Uni power (UP-2210) is used to measure and record all components of three phases including voltage, current, active and reactive power, PF and harmonics. The power analyzer stored all components in 6-min interval from no-load to full/over-load conditions. In addition, a three-phase IM in size of 100 HP with the same specification of IM from industrial is modeled by MATLAB/Simulink. A torque meter is used to increase motor load step by step. Then, a simulated PF meter measures the PF from no-load to full-load. The simulation diagram is shown in Fig. 1. The measured PF by power analyzer and simulation is illustrated in Fig. 2 in which the measured PF by simulation indicated a result close to the measured PF by power analyzer.

Fig. 1
figure 1

Simulated IM by MATLAB/Simulink

Fig. 2
figure 2

Measured PF by MATALB/Simulink

3 Kriging technique

Kriging is a geostatistical method, which is known as an interpolation technique. The kriging estimates unknown values based on nearby observed values at surrounding location and weights them in order to minimize the error of a predicted value. The kriging is more applicable in cases that the distance between each observed points and an unknown point is known. Therefore, the general kriging equation can be expressed in Eq. (1).

$$ \hat{Z}_{{\left( {S_{0} } \right)}} = \mathop \sum \limits_{i = 1}^{N} W_{i} Z_{{\left( {S_{i} } \right)}} $$
(1)

where \( Z_{{\left( {S_{i} } \right)}} \) is the observed value at the \( i \)th location. \( W_{i} \) is an unknown weight for the observed value at the \( i \)th location. \( S_{0} \) is the prediction location. \( N \) is the number of observed values. To apply this equation, obtaining of weighs \( W_{i} \) are important in which \( W_{i} \) can be computed by a semivariogram. The semivariogram is a function that relates semivariance of data points. It also describes the spatial autocorrelation of the observed values. There are many semivariagram models such as exponential, gaussian and spherical models. Exponential model, which can be a suitable for estimating PF, is applied in this study. Therefore, the exponential function will be expressed in Eq. (2)

$$ \gamma(h) = c\left( {1 - \exp \left( {\frac{ - 3h}{a}} \right)} \right) $$
(2)

where c is sill that semivariance at which levelling takes place, h is a distance between variables and a is a range that represents the maximum distance in x-axis of semivariagram model. The important key point of this method is applying a suitable semivariogram model to perform high output accuracy. Selecting the exponential model is more applicable since it is similar to the PF curve. Therefore, in Eq. (2), c can be replaced as a rated PF at maximum load \( (m_{\text{PF}} ) \), \( h \) is a distance between all load points and a is replaced as a maximum load \( (m_{\text{L}} ) \). \( \gamma \left( i \right) \) is semivariagram of exponential model. Lagrange matrix will be applied to obtain the weights of observed values. In the matrix, two main vectors are needed. One is obtained values of semivariagram model \( \gamma \left( i \right) \) and another is the distance between observed value and the point that will be estimated. Then, the Lagrange multiplayer matrix can be expressed in Eq. (3).

$$ \left[ {\begin{array}{*{20}l} {W_{1} } \hfill \\ {W_{2} } \hfill \\ \vdots \hfill \\ {W_{n} } \hfill \\ \lambda \hfill \\ \end{array} } \right] = \left[ {\begin{array}{*{20}l} {\gamma_{11} } \hfill & {\gamma_{12} } \hfill & \cdots \hfill & {\gamma_{1n} } \hfill & 1 \hfill \\ {\gamma_{21} } \hfill & {\gamma_{22} } \hfill & \cdots \hfill & {\gamma_{2n} } \hfill & 1 \hfill \\ \vdots \hfill & \vdots \hfill & \ddots \hfill & \vdots \hfill & \vdots \hfill \\ {\gamma_{m1} } \hfill & {\gamma_{m2} } \hfill & \cdots \hfill & {\gamma_{mn} } \hfill & 1 \hfill \\ 1 \hfill & 1 \hfill & \cdots \hfill & 1 \hfill & 0 \hfill \\ \end{array} } \right]^{ - 1} \left[ {\begin{array}{*{20}l} {\gamma_{10} } \hfill \\ {\gamma_{20} } \hfill \\ \vdots \hfill \\ {\gamma_{n0} } \hfill \\ 1 \hfill \\ \end{array} } \right] $$
(3)

In Lagrange multiplication, \( W_{i} \) is (m × 1 matrix) the weight of actual and estimated points which is unknown, \( \gamma _{i} \) is (m × n matrix) output of the semivariagram function and \( \gamma_{\text{no}} \) is a vector (m × 1) between the unknown loading points and observed loading points. Thus, from the obtaining values of \( W_{1} ,W_{2} , \ldots ,W_{n} \), the unknown PF can be estimated by Eq. (4).

$$ {\text{PF}} = W_{1} F_{1} + W_{2} F_{2} + WF_{3} + \cdots W_{n} F_{n} $$
(4)

where \( W_{i} \) is a weight between an estimated point and observed points and \( F_{i} \) is the observed PF. Then, multiplying observed points to obtained weights, PF at a desired point is going to be estimated. In the kriging algorithm, a loop function has been applied in order to estimate the PF at any desired loading condition [20, 21].

4 Regression technique

Regression analysis is a kind of statistical modeling that uses to describe the relationships between the independent variable x and the dependent variable y. Regression analysis is also applied to predict values named coefficients (\( \beta_{i} \)) between a dependent variables (\( y_{i} \)) and independent variables (\( x_{i} \)).

Predicted coefficients (\( \beta_{i} \)) with independent variables (\( x_{i} \)) create a new dependent variables (\( \hat{y}_{i} \)) with a significant model. Least squares method which is more used of regression analysis estimates the coefficients. There are many functions to use in least squares method including polynomial, exponential, logarithmic and power. Among these functions, polynomial function is more suitable in least square methods because of providing a model by nth degrees [15]. The polynomial function can be expressed by Eqs. (5) and (6).

$$ y_{i} = \mathop \sum \limits_{i = 1}^{n} f\left( {x_{i,} \beta } \right) + \varepsilon_{i} $$
(5)
$$ \hat{y}_{i} = f(x_{i} ,\beta ) = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2}^{2} + \cdots + \beta_{m} x_{n}^{m} $$
(6)

where in Eq. (5), \( y_{i} \) is observed value. \( f\left( {x_{i,} \beta } \right) \) is a polynomial function \( \varepsilon_{i} \) is error between observed and estimated values. In Eq. (6), \( \left( {\beta_{0} ,\beta_{1} ,\beta_{2} , \ldots \beta_{m} } \right) \) are the coefficients of polynomial where \( m \) indicates the coefficients. \( \left( {x_{1} + x_{2}^{2} + \cdots x_{n}^{m} } \right) \) are independent variables where \( m \) and \( n \) are number of polynomial degree and number of variables, respectively. In Eq. (7), \( \beta \) multiplied by \( x_{i,} \) provide an estimated value as \( \hat{y}_{i} \). The difference between \( y_{i} \) and \( \hat{y}_{i} \) is \( \varepsilon_{i} \). Now, by obtaining the value of \( \beta \) and \( \varepsilon_{i} \) with new set of \( x_{i} \), the values of \( y_{i} \) will be determined. Therefore, Eqs. (5) and (6) can be described in Eq. (7).

$$ \left[ Y \right] = \left[ X \right]\left[ \beta \right] + \left[ \varepsilon \right] $$
(7)

where [Y] is n-by-1 vector of dependent variables, [X] is n-by-m matrix of estimators with one column for each estimator and one row for each observation. [β] is a m-by-1 vector of unknown parameters that can be obtained by Eq. (8). [ε] is an n-by-1 vector. In order to minimize the errors ε, the least square procedure is applied in Eq. (9) in which Vondermonde matrix will be used to solve this equation.

$$ \beta = \left( {X^{\text{T}} X} \right)^{ - 1} X^{\text{T}} Y $$
(8)
$$ {\text{SSE}} = \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} $$
(9)

In polynomial regression, the number of polynomial degrees is described as following. For instance, first order creates a linear model. Second and third orders are named as quadratic and cubic make nonlinear models. Therefore, the presence of the real model is important for determination of polynomial degree in order to obtain the best fitting [18, 20].

Polyfit and Polyval are substantial functions in statistical MATLAB tools. Polyfit (\( p \)) function is used to obtain the coefficients for polynomial of degree (\( n \)). It can be also described as \( p = {\text{polyfit}}\left( {x,y,n} \right) \) in which \( x \) is observed point and determined as an independent value. \( y \) is observed value and is determined as a dependent value. \( n \) is a degree of polynomial that specifies the polynomial power of the coefficient in \( p \). \( p \) obtains the polynomial coefficients by using least squares procedure with selecting the number of degree (the length of \( p \) is \( n + 1 \)). In the procedure of polyfit, independent axis \( \left( x \right) \) requires forming a Vondermonde matrix with \( n + 1 \) columns. Polyfit solves the polynomial coefficients with \( p = V/y \) that expressed in Eq. (10).

$$ \left( {\begin{array}{*{20}l} {p_{1} } \hfill \\ {p_{2} } \hfill \\ \vdots \hfill \\ {p_{n} } \hfill \\ \end{array} } \right) = \left( {\begin{array}{*{20}l} {x_{1}^{n + 1} } \hfill & {x_{1}^{n} } \hfill & \ldots \hfill & 1 \hfill \\ {x_{2}^{n = 1} } \hfill & {x_{2}^{n} } \hfill & \ldots \hfill & 1 \hfill \\ \vdots \hfill & \vdots \hfill & \ddots \hfill & 1 \hfill \\ {x_{n}^{n + 1} } \hfill & {x_{n}^{n} } \hfill & \vdots \hfill & 1 \hfill \\ \end{array} } \right)^{ - 1} \left( {\begin{array}{*{20}l} {y_{1} } \hfill \\ {y_{2} } \hfill \\ \vdots \hfill \\ {y_{n} } \hfill \\ \end{array} } \right) $$
(10)

Polyval is a function that evaluates \( p \) at query points. The function can be describes as \( y = polyval\left( {p,x} \right) \) in which \( y \) output is the polynomial coefficients of degree \( n \) evaluated at query points \( x \). Therefore, combination of both functions with required number of degree can predict values at unknown points with significant accuracy.

5 Artificial neural network

In artificial neural network (ANN), the back-propagation is a multilayer feed-forward, and it is one of the most applied neural network models. The back-propagation utilizes the methods of mean square error and gradient descent to realize the modification to the connection weight of the minimum error sum of squares. In this algorithm, some measured values are given to the network as a training sample. Then, the initial values are assigned for the connection weights [22]. For updating weights, the error between the estimated and measured values is back-propagated via the network.

Then, reducing the error between estimated and measured values will be done after procedure of supervised learning. In the nonlinear model, the network is structured in three layers feed-forward back-propagation.

From Fig. 2, the structure of this network contains an input layer, a hidden layer of neurons (nonlinear transfer function) and an output layer of neurons (linear transfer functions). \( x_{j} \left( {j = 1,2, \ldots ,n} \right) \) indicates the input variables, \( z_{i} \left( {i = 1,2, \ldots ,m} \right) \) describes the output of neurons in the hidden layers, and \( y_{t} \left( {t = 1,2, \ldots ,l} \right) \) states the output of the neural network [23].

Neural network enables to create any kind of patterns by given sufficient input values. Training the network by suitable method such as Levenberg–Marquardt back-propagation will determine the excellent weight in order to fit the inputs and targets. The training process of updating the weights values can be done with two important steps (Fig. 3).

Fig. 3
figure 3

(Reproduced with permission from [23])

Structure of NNBP layers.

The first step is hidden layer that the below function in Eqs. (11) and (12) explains the calculation of hidden layer for outputs of all neurons. \( {\text{net}}_{i} \) is the activation value of the \( i \)th node, \( z_{i} \) is the output of the hidden layer, and \( f_{H} \) in Eq. (13) is a activation function that in this case sigmoid function is determined.

$$ {\text{net}}_{i} = \mathop \sum \limits_{j = 0}^{n} w_{ji} x_{j} v_{i} = 1,2, \ldots m $$
(11)
$$ z_{i} = f_{H} \left( {{\text{net}}_{i} } \right) i = 1,2, \ldots ,m $$
(12)
$$ f_{H} \left( x \right) = \frac{1}{{1 + \exp \left( { - x} \right)}} $$
(13)

The second step is the output in which the below function in Eq. (14) shows the output of all neurons in the output layer.

$$ y_{t} = f_{t} \left( {\mathop \sum \limits_{i = 0}^{m} w_{it} z_{i} } \right) t = 1,2, \ldots , l $$
(14)

where (\( f_{t} \left( {t = 1,2, \ldots , l} \right) \) is a line function. The weights set with observed values and are minimized by the delta rule according to the learning samples. The topology in this study determines by a set of observed values and errors in order to select the suitable number of neurons.

6 Support vector regression

The theory of SVR was developed by Vapnik in 1997 and is known as one of the significant technique in terms of solving regression problem. The SVR method constructs a hyperplane in high-dimensional space in order to minimize the generalization error between defined upper and lower bound [24]. SVR can only act in linear way but by mapping the main space into the higher-dimensional space, it would construct a set of hyperplanes close to the all data points to solve a nonlinear model. The data point is \( D = \left\{ {\left. {X_{i} ,t_{i} } \right\}_{i}^{n} } \right. \) where \( x_{\text{i}} \) is the input vector, \( t_{i} \) is the target output and n is the number of data sample. Therefore, the regression function can be expressed in Eq. (15).

$$ Y = f\left( x \right) = w\phi \left( x \right) + b $$
(15)

where \( \phi \left( x \right) \) is the hyperplane in high-dimensional space. X is a m-dimensional feature space. \( w \) and \( b \) are coefficients of SVR that solve the regression problem. \( w \) and \( b \) are required to be found by minimizing the regularized empirical risk function in Eq. (16) and a loss function in Eq. (17).

$$ R_{\text{emp}} = C\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} L_{\varepsilon } \left( {t_{i} ,y_{i} } \right) + \frac{1}{2}w^{\text{T}} w $$
(16)
$$ L_{\varepsilon } \left( {t_{i} - y_{i} } \right) = \left\{ {\begin{array}{*{20}l} 0 \hfill & {\left| {t_{i} - y_{i} } \right| \le \varepsilon } \hfill \\ {\left| {t_{i} - y_{i} } \right| - \varepsilon } \hfill & {\text{otherewise}} \hfill \\ \end{array} } \right. $$
(17)

In the risk function, \( C\frac{1}{n}\sum\nolimits_{i = 1}^{n} {L_{s} \left( {t_{i} ,y_{i} } \right)} \) is empirical risk error and \( \frac{1}{2}w^{\text{T}} w \) is a regularization term or the flatness of the function that needs to be minimized for simplification of the model. \( L_{\varepsilon } \left( {t_{i} - y_{i} } \right) \) is an intensive loss function. Parameter C is named as a capacity of the SVR that decides the trade between the regularization term and the empirical risk. \( {\rm E} \) is named as size of the hyper-dimensional cylinder that covers the function with the training data points. SVR performs linear regression in high-dimensional feature space using \( \varepsilon \)-insensitive loss, and at the same time tries to reduce model complicity by minimizing \( w^{\text{T}} w \). The minimization would be determined by introducing slack variables \( \xi_{i}^{ - } ,\xi_{i}^{ + } \quad i = 1, \ldots n \) since \( \varepsilon \)-insensitive loss is equal to slack variables. Figure 4 indicates \( \varepsilon \) and limits \( \xi \) in the ε-insensitive function.

Fig. 4
figure 4

(Reproduced with permission from [25])

Error \( \varepsilon \) and limits \( \xi \) in the ε-insensitive function.

The parameters C and \( \varepsilon \) will be set by designer during training step for optimizing slack variables [25]. To calculate the parameters of \( w \) and \( b, \), Eq. (16) changes to Eq. (17). Slack variables \( \xi_{i}^{ - } \) and \( \xi_{i}^{ + } \) represent upper and lower limits in the output and minimized by Eq. (18).

$$ \begin{aligned} {\text{Minimize}}\quad \frac{1}{2}w^{\text{T}} w + C\mathop \sum \limits_{i = 1}^{n} \left( {\xi_{i}^{ - } + \xi_{i}^{ + } } \right) \hfill \\ {\text{Subject}}\;{\text{to}}\;{\text{the}}\;{\text{constraints:}} \hfill \\ \left\{ {\begin{array}{*{20}l} {\alpha _{i}^{ + } } \hfill & { - t_{i} + y_{i} + \varepsilon + \xi_{i}^{ + } \ge 0 } \hfill \\ {\alpha _{i}^{ - } } \hfill & {t_{i} + y_{i} + \varepsilon + \xi_{i}^{ - } \ge 0 } \hfill \\ { \mu_{i}^{ + } } \hfill & {\xi_{i}^{ + } \ge 0} \hfill \\ { \mu_{i}^{ - } } \hfill & { \xi_{i}^{ - } \ge 0} \hfill \\ \end{array} } \right. \hfill \\ \end{aligned} $$
(18)

where \( \alpha _{\text{i}}^{ + } ,\alpha _{\text{i}}^{ - } \) and \( \mu_{\text{i}}^{ + } ,\;\mu_{\text{i}}^{ - } \) are the coefficients of Lagrange multipliers. The \( w \) will be obtained by applying partial derivative in Eq. (19).

$$ \begin{aligned} &\frac{{\partial R_{f} }}{\partial w} = 0 w = \mathop \sum \limits_{i} (\alpha _{i}^{ + } - \alpha _{i}^{ - } )x_{i} \hfill \\ &\frac{{\partial R_{f} }}{\partial b} = 0 \mathop \sum \limits_{i} (\alpha _{i}^{ + } - \alpha _{i}^{ - } ) = 0 \hfill \\ &\frac{{\partial R_{f} }}{{\partial \xi_{i}^{ + } }} = 0 \alpha _{i}^{ + } + \mu_{i}^{ + } = C \hfill \\ &\frac{{\partial R_{f} }}{{\partial \xi_{i}^{ - } }} = 0 \alpha _{i}^{ - } + \mu_{i}^{ - } = C \hfill \\ \end{aligned}$$
(19)

Moreover, for obtaining the value of b, two main parameters are required. One is w which is calculated by Eq. (19) and another is S (Support vector) that can be considered from Eq. (20). Therefore, considering both Eqs. (19) and (20), b would be determined in Eq. (21). Therefore, regression function in Eq. (22) solves the nonlinear problem.

$$ S = \left\{ {i| \quad 0 < \alpha _{i}^{ + } + \alpha _{i}^{ - } < C} \right\} $$
(20)
$$ b = \frac{1}{\left| S \right|}\mathop \sum \limits_{i \in S} \left[ {t_{i} - w^{T} - x_{i} - {\text{sign}}(\alpha _{i}^{ + } - \alpha _{i}^{ - } )\varepsilon } \right] $$
(21)
$$ y = \mathop \sum \limits_{i} (\alpha _{i}^{ + } - \alpha _{i}^{ - } )K\left( {X_{i} , X_{j} } \right) + b $$
(22)

where \( (\alpha _{\text{i}}^{ + } - \alpha _{\text{i}}^{ - } ) \) is support vector coefficients and \( K\left( {X_{i} , X_{j} } \right) \) is the kernel function. There are several kernel functions to solve the minimization problem. In this study, radial basis function (RBF) using Gaussian is used by Eq. (23) where \( \sigma \) is the dispersion coefficient of the Gaussian.

$$ K\left( {X_{i} , X_{j} } \right) = \exp \left( { - \frac{1}{{2\sigma^{2} }} \left\| {X_{i} - X_{j} } \right\|} \right) $$
(23)

In Lagrange multipliers, the following Karush–Kuhn–Tucker (KKT) and the quadratic programming will consider nonzero values to the \( \alpha _{\text{i}}^{ + } ,\alpha _{\text{i}}^{ - } \) which are defined support vectors. By multiplying the support vectors to the kernel \( K\left( {X_{i} , X} \right) \), the output provide errors equal, less or greater than ε. Kernel function is equal to vectors \( X_{i} \) and \( X_{j} \) in the feature space as \( \phi \left( {X_{i} } \right) \) and \( \phi \left( {X_{j} } \right) \) where \( K\left( {X_{i} ,X_{j} } \right) = \phi \left( {X_{i} } \right)*\phi \left( {X_{j} } \right) \). Therefore, the training of the SVR can solve a quadratic and convex optimization problem [25].

7 Results and discussions

Input power measurement method is applied for motor load calculation in order to indicate PF against motor load. Determining the PF at any load points leads to select the proper size of capacitors in order to prevent under- or over-correction. Under-correction indicates low PF that produces a penalty of charge. Over-correction generates more reactive power or current than the motor needed. In such cases, self-excitation takes place due to higher reactive current than magnetizing current.

Hence, aforementioned reasons can prove the importance of the PF from no-load to full/over-load conditions. Ukil published a method using measured current and manufacturer data (MCMD) to estimate the PF from no-load to full/over-load condition, and the result of IM 100HP in Fig. 5 shows that the MCMD method produced hug errors from no-load to full/over-load conditions in large IMs. This method only provided satisfied performance in small IMs because in the small IM the reactive current is almost constant from no-load to full/over-load.

Fig. 5
figure 5

Results of estimated PF by MCMD method

However, the method was not able to estimate the PF in the large IM because reactive current cannot be constant due to high air gap that variation of motor loads from no-load to full/over-load causes reactive current to be changed highly. In this paper, to resolve these issues, several estimation techniques have been implemented in order to minimize the errors. Kriging and regression as numerical techniques are used to estimate the PF. In Kriging, the distance between target points and observed load points is considered. Then, by using exponential function and Lagrange matrix, the weight of observed points is computed. Multiplying of observed points in their weights computed the PF at a desired point. In this method, a loop function is applied to predict the PF at every loading from no-load to full-load conditions. The obtained weights times the observed PF provided the PF at target load points.

Figure 6 indicated that Kriging generated results very close to the measured points from no-load to full-load. However, it can be seen huge errors between estimated and measured values from full-load to over-load. Since Kriging is an interpolation technique, it is not able to make an extrapolation at over-load condition. However, in regression, polynomial function is applied in which polynomial degrees provided significant roles that each number of polynomial degree creates different models. Polyval and Polyfit in MATLAB can be significant functions to determine the polynomial coefficients and then create a model fit to the observed PF curve.

Fig. 6
figure 6

Results of estimated target PF by Kriging method

Figure 7 and Table 1 illustrated the trained data and predicted polynomial coefficients in first, second, third and fourth orders. The evidence confirmed that polynomial regression with fourth order produced best fitting to the observed PF curve. Therefore, based on the existed model, Fig. 8 indicates the estimated unknown PF from no-load to full/over-load conditions. However, from full-load to over-load there is a huge gap between estimated and measured PF. Although both methods produced results from no-load to full-load very close to the measured points, kriging and regression methods could not fit the model in the over-load condition that results indicated extreme errors at over-load condition.

Fig. 7
figure 7

Results of fitness in polynomial regression

Table 1 Predicted coefficients of polynomial degrees
Fig. 8
figure 8

Results of estimated PF in polynomial regression

The reason is that both methods are not able to extrapolate the data at unseen points. Figures 6 and 8 indicated that these methods obtained unacceptable results with huge errors at over-load conditions. To optimize these issues, the study found intelligent techniques including ANN and SVR in order to estimate the PF not only between the known observation, but also to estimate the PF at over-load conditions with high performance. In ANN method, feed-forward back-propagation algorithm is used in which five and three of input data are selected as training and testing, respectively. Considering three hidden layer and a Levenberg–Marquardt to train the algorithm indicated a significant generalization.

Figure 9 shows the fitness between observed points and output points where the output values are fitted to the input data. Figure 10 illustrates the estimated PF from no-load to full-load and over-load. The results illustrated that NNBP performed a great fitting from no-load to over-load. In spite of the fact that NNBP produced the results very close to measured points with small error, several times are applied to run the algorithm that each running generated different results. Therefore, in this method, obtaining the best result creates a difficulty due to running the algorithm more than once. This can be a main disadvantage of ANN method. To optimize the issue of NNBP, the SVR is used to provide a fixed model and estimate the PF at any loading point. The strategy of SVR is constructed a set of hyperplanes close to the all data points with a lower and upper bound. The RBF kernel function is used to obtain the support vectors. The parameters of SVR have a significant role in terms of creating a suitable model. In this case, the proper design of parameters indicated a model very close to the observed points. The excited model leads to estimate the unknown points from no-load to full-load and over-load conditions properly. The estimated PF from no-load to over-load points is shown in Fig. 11. As a result, the comparison between implemented methods expressed that the SVR method obtained satisfactory results in small, medium and large IMs. The possibility of adjusting the parameters is one of the main advantages of this method that is able to provide desired models (Fig. 12).

Fig. 9
figure 9

Results of fitness in NNBP

Fig. 10
figure 10

Results of estimated PF in NNBP

Fig. 11
figure 11

Results of estimated PF in SVR

Fig. 12
figure 12

Results of fitness in SVR

The designed parameters of SVR are shown in Table 2.

Table 2 Parameters of SVR

As a result, Table 3 illustrates the fitness and accuracy of existed models where the fitness is computed by R-squared and the accuracy obtained by minimum value divided by maximum value at each points, and then the average value provided accuracy. Error of estimated points is obtained by mean absolute percentage error (MAPE). In Kriging and MCMD, the input data are not trained as like as other methods due to different strategies. The error results observed that MCMD produced a lower accuracy in 85.5%. However, the SVR provided a high accuracy in 99.6% compared with others. The error is calculated by mean absolute percentage error (MAPE) which is shown in. The computation time in proposed methods are illustrated in second.

Table 3 Validity and accuracy of proposed methods

8 Conclusions

The power factor of induction motors is one of the significant elements that must be maintained toward unity. The power factor is variable while the motor load changes from no-load to full/over-load. This variation caused monitoring and determining the low power factor at any loading condition becomes important due to finding the optimal reactive power for power factor compensation. In this paper, several estimation techniques are applied to estimate the power factor at any loading conditions. Kriging and regression which are numerical methods estimated the power factor with reasonable results from no-load to full-load. However, both methods produced very poor results from full-load to over-load. Neural network and support vector regression which are intelligent techniques created greater results from no-load to full/over-load conditions in which the support vector regression method indicated a satisfactory performance with accurate results greater than ANN and numerical methods.