Evolutionary support vector regression for monitoring Poisson profiles

Yeganeh, Ali; Abbasi, Saddam Akber; Shongwe, Sandile Charles; Malela-Majika, Jean-Claude; Shadman, Ali Reza

doi:10.1007/s00500-023-09047-2

Evolutionary support vector regression for monitoring Poisson profiles

Application of soft computing
Open access
Published: 20 September 2023

Volume 28, pages 4873–4897, (2024)
Cite this article

Download PDF

You have full access to this open access article

Soft Computing Aims and scope Submit manuscript

Evolutionary support vector regression for monitoring Poisson profiles

Download PDF

936 Accesses
5 Citations
Explore all metrics

Abstract

Many researchers have shown interest in profile monitoring; however, most of the applications in this field of research are developed under the assumption of normal response variable. Little attention has been given to profile monitoring with non-normal response variables, known as general linear models which consists of two main categories (i.e., logistic and Poisson profiles). This paper aims to monitor Poisson profile monitoring problem in Phase II and develops a new robust control chart using support vector regression by incorporating some novel input features and evolutionary training algorithm. The new method is quicker in detecting out-of-control signals as compared to conventional statistical methods. Moreover, the performance of the proposed scheme is further investigated for Poisson profiles with both fixed and random explanatory variables as well as non-parametric profiles. The proposed monitoring scheme is revealed to be superior to its counterparts, including the likelihood ratio test (LRT), multivariate exponentially weighted moving average (MEWMA), LRT-EWMA and other machine learning-based schemes. The simulation results show superiority of the proposed method in profiles with fixed explanatory variables and non-parametric models in nearly all situations while it is not able to be the best in all the simulations when there are with random explanatory variables. A diagnostic method with machine learning approach is also used to identify the parameters of change in the profile. It is shown that the proposed profile diagnosis approach is able to reach acceptable results in comparison with other competitors. A real-life example in monitoring Poisson profiles is also provided to illustrate the implementation of the proposed charting scheme.

A multi-strategy improved particle swarm optimization algorithm and its application to identifying uncorrelated multi-source load in the frequency domain

Article 02 January 2016

Comparative Analysis of Intelligently Tuned Support Vector Regression Models for Short Term Load Forecasting in Smart Grid Framework

Article 28 December 2016

Optimizing Fault Detection for Big Data Analytics Through Evolutionary Computation

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A control chart demonstrates a virtual representation of a process over time; hence, they are a valuable tool for monitoring process performance in statistical process control (SPC). Control charts have been widely used in industrial processes since the 1920s in various applications; see, for example, Aslam et al. (2020). Usually, control charting in SPC is implemented in two phases: Phase I (retrospective phase) and Phase II (monitoring phase). The aims of Phase I include estimation of process’s stability, the in-control (IC) parameters and control limits through a preliminary retrospective study using historical datasets; while, in Phase II analysis, one uses the IC model, obtained from Phase I, as a base scheme to control the process in real time and detect outliers as quickly as possible (Yeganeh et al. 2022a). So, it is vital to control the process over time in Phase II as changes in the process parameters might be caused by some unnatural patterns which might be due to faults, non-compatible products, low quality raw materials and so forth (Montgomery 2019; Gupta et al. 2006). In on-line monitoring (Phase II analysis), the instability of a process, should be identified as early as possible, is declared with the out-of-control (OOC) signals. This is usually evaluated in terms of the required number of drawn points on a control chart before an OOC signal; it is denoted as average run length (ARL). In a fair evaluation, the ARL value for the IC process, referred to ARL₀, is adjusted at a constant and desired value and the charts endeavour to provide the minimum ARL value in OOC condition, called ARL₁ (Yeganeh et al. 2023). Note that the greater the ARL₁, the weaker the detection ability of a control chart (Montgomery 2019). In addition to the ARL, control charts’ performance metrics include the, performance comparison index (PCI), standard deviation of run length (SDRL), relative average run length (RARL) and extra quadratic loss (EQL) (Riaz et al. 2014).

To construct a control chart in SPC either in Phase I or Phase II, two different approaches can be used: (i) investigation of the distribution function of a single or multiple quality characteristics; (ii) checking the stability of a functional relationship between a dependent (response) and one or several independent (explanatory) variables over time. The term profile monitoring hints using of SPC techniques to investigate the stability of the functional relationship over time instead of monitoring a single or multivariate quality characteristic.

It is noteworthy to mention that this topic has firstly been introduced with ‘signature’ term in Gardner et al. (1997) and the use of ‘profile’ became more commonplace after the extension of exponentially weighted moving average (EWMA) control charts in monitoring simple linear profiles in which there is a linear relationship between a response and an explanatory variable (Kang and Albin 2000). Since then, many researchers have been focusing on monitoring linear profiles; see, for example, Gupta et al. (2006), Zou et al. (2007), Huwang et al. (2014), Motasemi et al. (2017), Haq (2020), Yeganeh et al. (2021) and the references therein. Also, some researchers focused on monitoring other types of profiles including nonlinear (Williams et al. 2007), roundness (Pacella and Semeraro 2011), exponential (Steiner et al. 2016), circular (Zhao et al. 2020), multi-channel autoregressive (Zhou et al. 2022) and non-parametric (Jones et al. 2020; Zou et al. 2008; Nassar and Abdel-Salam 2021).

As a secondary approach, some researchers considered two restricting assumptions in all the monitoring schemes. First, the response variables might not be continuous and have a discrete form. For example, the response variable may demonstrate the number of defects (percent of defective) per item. Secondly, it is usually assumed that the random error and therefore, the response variables follow the normal distribution. For many real-life applications, the normality assumption can be violated. Considering these limitations, generalized linear models (GLM) constituting a large class of statistical models relating responses to linear combinations of predictor variables, have been extended to the profile monitoring regime. There are two major categories of GLMs that have had the most applications in the literature, these are logistic and Poisson profiles. For more details on the former, readers are referred to the works of Yeh et al. (2009), Shang et al. (2011), Huwang et al. (2016), Alevizakos et al. (2019a) and Mohammadzadeh et al. (2021). Recently, many studies are focused on monitoring Poisson profiles, for instance, Zhou et al. (2012) proposed an EWMA chart with random sample sizes where it was observed that using a novel updating formulation, not only yields a more robust IC and OOC performance, but also a generally more sensitive chart to small and moderate shifts. The Phase I monitoring of Poisson profiles has been carried out by Amiri et al. (2015), where they developed three different schemes based on the likelihood ratio test (LRT), Hotelling’s T² and F statistics. To extend this work in Phase II, Qi et al. (2016) proposed the weighted LRT (WLRT) scheme by combining the EWMA and LRT statistics. They also evaluated the LRT, LRT-EWMA and multivariate EWMA (MEWMA) control charts in Phase II applications of Poisson profiles under the assumption of fixed and random explanatory variables. The results showed the superiority of WLRT over other competitors. Later, Qi et al. (2017) extended the WLRT approach for autocorrelated processes. A change point statistic was developed by Shadman et al. (2017) which was also applied for efficient monitoring of linear profile parameters (Xu et al. 2012) for the GLM profiles. By considering all samples as the candidate change point, they computed LRT statistic for the two groups of samples including IC (or before change point) as well as OOC (or after change point) for both logistic and Poisson profiles. More recently, the change point approach was also implemented for autocorrelated Poisson profiles in He et al. (2020). In addition to the existence of autocorrelation, they assumed random explanatory variables. Another LRT control chart for profiles with random variables was proposed by Song et al. (2021). In addition to random predictors, they considered profiles with auto-correlation. Similar to Shadman et al. (2017), Shang et al. (2018) extended MEWMA charts for Poisson and logistic profiles by considering no prior information about the process in the OOC situations (i.e. non-parametric models). In other words, only the profile parameters have been changed and the type of relation is fixed in the parametric models; whereas, the relation can be transferred to other types, for example, changing a linear IC profile to a non-linear OOC profile, without any limitations. Some remedial methods in parameter estimations of Poisson profiles and computation of process capability index can also be found in Maleki et al. (2019) and Alevizakos et al. (2019b). A non-parametric approach to generalized likelihood ratio and the EWMA schemes for a real case study could be found in Wang et al. (2022).

The investigation of the related mentioned literature and the existing review papers in this field reveal that there is little attention given to machine learning approaches in comparison to statistical approaches; not only in the GLM profiles, but also in all types of profiles monitoring (Maleki et al. 2018; Woodall 2007). To the best of the authors’ knowledge, artificial neural network (ANN) has only been used for profile monitoring in Hosseinifard et al. (2011), Pacella and Semeraro (2011), Yeganeh et al. (2022a) and Yeganeh and Shadman (2020). Li et al. (2019) used support vector regression (SVR) technique for function fitting in the non-linear profile monitoring process. Autoencoders and transfer learning are part of the deep learning techniques that have also been developed in autocorrelated (Chen et al. 2020) and multiple profiles (Fallahdizcheh and Wang 2022). One of the main reasons of this reluctance, may be the weaker performance of machine learning techniques than statistical approaches in terms of ARL. For example, the ANN-based control chart proposed by Hosseinifard et al. (2011) were not able to improve on the performance of conventional EWMA control charts for the detection of most of the shifts when there is a simple linear profile model. To remedy this weakness, Yeganeh and Shadman (2020) improved the performance of the Hosseinifard et al. (2011)’s control chart using supplementary run-rules but there was no modification implemented in the structure of the ANN by Hosseinifard et al. (2011). The same can be said about other mentioned machine learning-based control charts; in other words, they used a simple conventional structure of ANN or SVR and it may be one of the possible reasons for the weakness in their performance. Another weakness of machine learning techniques may be related to their complexity. Although machine learning techniques are more complex and considered as a “black box”, they can produce more accurate results (Cuentas et al. 2022). On the other hand, with the rapid development of digital technologies, the complexity issue of machine learning techniques, in particular deep learning models, are becoming less important over time in real-world applications, in which several complicated on-line models, such as image processing, computer vision and speech recognition, have been developed that can easily be applied in real applications (Chen et al. 2020). In addition, interpreting the prediction of machine learning methods have been studied recently (Pourpanah et al. 2016). However, it is not the focus of this study, but it is an interesting area that requires further investigations.

Considering the limited use of machine learning techniques in profile monitoring, this paper introduces a novel SVR structure as a control chart for monitoring Poisson profiles in Phase II of SPC. The aim of this study is to develop a control chart with quicker detection ability over conventional schemes in Poisson profile monitoring problem which is equivalent to have a more optimized process with lower non-compatible products, cost, waste and other non-desirable outputs. To achieve this, we first define/extract more informative input features and then fed them into a well-known machine learning technique i.e., SVR for training in an offline manner. Finally, the trained models can be implemented to monitor the process online and detect any OOC situations. It makes considerable contributions not only in the input features of the SVR but also in the training procedure to enhance the sensitivity in detecting OOC situations. In addition to improvement in detection ability of a Poisson process with machine learning, other contributions of this paper can be summarized as follows:

Develop a novel structure of SVR as a base control chart.
Introduce a new input layer structure for the proposed and other related schemes.
Taking advantage of a novel training of the proposed SVR.
Enhance the detection ability of the proposed charting technique in comparison with ANN.
Evaluation of the performance of the proposed scheme under parametric and non-parametric scenarios.
Use of the diagnosis procedure in Poisson profiles with SVR.

The rest of this paper is briefly structured as follows: the fundamental framework of Poisson profile model used in Phase II monitoring and the formulations of the two fundamental control chart concepts are briefly presented in Sect. 2. in addition, a brief introduction to some important concepts about SVR is given in Sect. 2, which include the principles of evolved SVRs and a description of particle swarm optimization (PSO) algorithm. Section 3 provides a full description of the proposed SVR-based control chart. Section 4 investigates the performance of the proposed scheme in terms of the ARL and SDRL. The comparisons with the existing counterparts are presented in Sect. 5. Section 6 presents the diagnosis procedure of the proposed approach, while Sect. 7 provides an illustrative example. Finally, the conclusion, recommendations and future research works are presented in Sect. 8.

2 Preliminaries

2.1 Phase II Poisson profile monitoring

Assume that n observations are collected for the jth (j = 1, 2, …) random profile. Let (x_ijk, y_ij) represent the pairs of observations on two variables from the jth random profile available in the GLM profiles in the form of X_ij = (x_ij1, x_ij2,…, x_ijp), with i = 1,2,…,n and k = 1,2,…,p. Since in fixed design monitoring, the explanatory variables are assumed to be constant in each profile, the j indices are omitted in the explanatory variables then X_ij = X_i $\forall j$. Thus, they can be written as an n × p matrix denoted as $\tilde{\varvec{X}}$ and can be defined by

$$\tilde{\varvec{X}} = \left( {\begin{array}{*{20}c} {{\varvec{X}}_{{\varvec{1}}} } \\ \vdots \\ {{\varvec{X}}_{{\varvec{n}}} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {x_{11} } & {x_{12} } & \cdots & {x_{1p} } \\ \vdots & \vdots & \ddots & \vdots \\ {x_{n1} } & {x_{n2} } & \cdots & {x_{np} } \\ \end{array} } \right).$$

(1)

A GLM model in the jth sample consists of the following three main components:

(i)
The 1 × n response vector Y_j = (y_1j, y_2j, …, y_nj) under the discrete distribution, with the mean $\mu_{ij} = E\left( {y_{ij} |(x_{i1} ,x_{i2} , \ldots ,x_{ip} } \right)$, which belongs to the same distribution from the exponential family (e.g., Poisson or binomial distribution). Considering the independency of observations within and between profiles, we have µ_j = (µ_1j, µ_2j, …, µ_nj) for the jth profile.
(ii)
The matrix of the independent variables are the same as those in (1).
(iii)
The monotone link function g that connects the mean of the response variable to the combination of the linear predictors in the way that g(_j) = η_j = ${\tilde{\varvec{X}}\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta } }_{j}$ where η_j is the linear combination of the jth profile parameters $\left( {\mathop {{\beta}} \limits^{\frown }_{j} } \right)$. From the above framework, a Poisson GLM model is given by:
$$\begin{gathered} {\varvec{Y}}_{{\varvec{j}}} = Poisson\left( {{\varvec{\mu}}_{{\varvec{j}}} } \right){,}j = 1{,}2{,}....{,} \hfill \\ log\left( {{\varvec{\mu}}_{{\varvec{j}}} } \right) = {\tilde{\varvec{X}}\beta }_{{\varvec{j}}} {.} \hfill \\ \end{gathered}$$
(2)

The parameters of model (2) denoted by $\mathop {\varvec{\beta}_{j}}\limits^{\frown } = \left( \mathop {\beta_{1j}}\limits^{\frown } ,\mathop {\beta_{2j}} \limits^{\frown },...,\mathop {\beta _{pj}} \limits^{\frown } \right)$, are estimated with the iterative weighted least squares (IWLS) algorithm in this paper. However, to save space, this is not included here. For more details readers are referred to Yeh et al. (2009) and Amiri et al. (2015). Hence, the aim of monitoring Poisson profiles is defined as the detection of changes in $\mathop {{\beta_{j}}} \limits^{\frown }$ from its IC value, denoted with β₀ = (β₁₀, β₂₀, …, β_p0). Note that monitoring explanatory variables in GLM profiles is as important as monitoring the response variable; see for example, Shang et al. (2011); note though, this is not the focus of this paper.

2.2 Existing control charts

In this subsection, the details of the two existing fundamental approaches based on the MEWMA and LRT schemes are provided (Qi et al. 2016). To simultaneously control the p-dimensional IC coefficient vector β₀, its estimate $\left( {\mathop {{\varvec{\beta}} _{j} }\limits^{\frown } } \right)$ is scaled as follows:

$${\varvec{Z}}_{j} = {\varvec{S}}\left( {\mathop {{\beta}} \limits^{\frown }_{j} - {\varvec{\beta}}_{0} } \right),$$

(3)

with ${\varvec{S}} = \left( {\tilde{\varvec{X}^{\prime}}W\tilde{\varvec{X}}} \right)^{\frac{1}{2}}$ and $\mathop {{\varvec{\beta}} _{j} }\limits^{\frown } = \mathop {\beta _{{1j}} }\limits^{\frown } ,\mathop {\beta _{{2j}} }\limits^{\frown } ,...,\mathop {\beta _{{pj}} }\limits^{\frown }$ where S is a p × p symmetrical matrix and β₀ is the IC $p$-dimensional parameters vector. Considering µ₀ = (µ₁₀, µ₂₀, …, µ_n0) and $log\left( {\mu_{i0} } \right) = \varvec{X^{\prime}}_{i} {\varvec{\beta}}_{{\varvec{0}}}$, W is an n × n diagonal symmetrical matrix with the main diagonal elements given by µ₁₀, µ₂₀,…, µ_(n−1)0 and µ_n0. It is worth mentioning that S depends on $\tilde{\varvec{X}}$ in Eq. (1) hence there will be the varied S matrix, indeed using of S_j instead of S, when the explanatory variable is not constant in each profile.

From (3), the EWMA statistic for the scaled p-dimensional parameters vector is defined as:

$${\varvec{E}}_{j} = \lambda {\varvec{Z}}_{j} + (1 - \lambda ){\varvec{E}}_{j - 1} ,j = 1,2,...,$$

(4)

where E₀ is a vector of zeros with size (p + 1) and λ (0 < λ < 1) is the EWMA constant (or smoothing parameter) which we considered to be equal to 0.2. Thus, MEWMA statistic is given by:

$$M_{j} = \varvec{E^{\prime}}_{j} {\varvec{E}}_{j} .$$

(5)

For more details on the LRT formulation for the Poisson profiles, readers are referred to Amiri et al. (Amiri et al. 2015). By taking the logarithm on both sides of the joint likelihood function of the independent observations, the LRT statistic is constructed as:

$$LRT_{j} = 2\left( {l_{j} \left( {\mathop {\varvec{\beta}} \limits^{\frown }_{j} } \right) - l_{j} \left( {{\varvec{\beta}}_{0} } \right)} \right),$$

(6)

$$\begin{aligned} l_{j} \left( {\mathop {{\varvec{\beta}} _{j} }\limits^{\frown } } \right) = & \sum\limits_{{i = 1}}^{n} {y_{{ij}} \log \left( {\mu _{{ij}} } \right)} - \sum\limits_{{i = 1}}^{n} {\log \left( {\mu _{{ij}} } \right)} - \sum\limits_{{i = 1}}^{n} {\log \left( {y_{{ij}} !} \right)} , \\ l_{j} \left( {{\varvec{\beta}} _{0} } \right) = & \sum\limits_{{i = 1}}^{n} {y_{{ij}} \log \left( {\mu _{{i0}} } \right)} - \sum\limits_{{i = 1}}^{n} {\log \left( {\mu _{{i0}} } \right)} - \sum\limits_{{i = 1}}^{n} {\log \left( {y_{{ij}} !} \right)} , \\ \mu _{{ij}} = & e^{{{\varvec{X}}_{i}^{\prime } \mathop {\varvec{\beta}} \limits_{j}^{\frown } }} , \\ \mu _{{i0}} = & e^{{{\varvec{X}}_{i}^{\prime } {\varvec{\beta}} _{0} }} . \\ \end{aligned}$$

The likelihood function which evaluates the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters is demonstrated by l_j(·) in the LRT scheme. It is calculated by consideration of the IC process and estimated parameters of profiles in (6).

2.3 SVR formulation

In 1995, an innovative machine learning method called support vector machine (SVM) was introduced by Vapnik (1995) in order to rectify the drawbacks of the ANN methods especially in classification problems. The idea of SVM is based on the minimization of the training error with empirical or structural risk decreasing. For this aim, the features of a nonlinear problem is mapped to another hyperplane with the aim of maximization of the geometric margins and minimization of the classification error. Although SVM has been able to present itself as a powerful method in supervised learning classification problems, its general form only solves the binary classification problems and some treatments should be applied in multi-class classifications and regression problems. In this paper, SVM for regression (hereafter, SVR) is used and it is briefly described in this section. For more details, an interested reader is referred to Vapnik (1995), Cortes and Vapnik (1995), Vapnik (1998) and Stoean and Stoean (2014).

Similar to other supervised learning techniques, a train dataset is firstly prepared. For simplicity but without loss of generality, we show inputs and targets with B_g and T_g respectively in a way that g is the indicator of samples’ indexes. Also, it is assumed that inputs and targets are continuous values with dimension U and 1, and there are G samples in the training dataset. Hence, we have a G × (U + 1) dataset and it could be shown with (B_g, T_g); g = 1, 2, …, G. Conventional SVM and SVR usually utilise the following formulation for establishing a relationship between inputs and outputs (estimated targets)

$$f\left( {B_{g} } \right) = w\phi \left( {B_{g} } \right) + b.$$

(7)

In (7), a predefined kernel function ϕ(·) in combination with some weights (w) and bias (b) are used to carry out the mapping tasks and generally, the aim of training is to reach the best values for weights and bias. To obtain the weights and bias, a soft margin (i.e., a possible acceptable interval) is defined as

$$\varepsilon - \xi_{g}^{ - } \le f\left( {B_{g} } \right) - T_{g} \le \varepsilon + \xi_{g}^{ + } ,$$

(8)

where ε is the absolute acceptable difference between the target values and estimated ones; while ξ_g is the generated loss in the gth sample from the training dataset. In other words, the loss function is defined as:

$$Loss\left( {f\left( {B_{g} } \right),T_{g} } \right) = \left\{ {\begin{array}{*{20}c} {0\quad \quad \quad \quad \quad \;\;} & {\left| {f\left( {B_{g} } \right) - T_{g} } \right| \le \varepsilon } \\ {\left| {f\left( {B_{g} } \right) - T_{g} } \right| - \varepsilon } & {{\text{otherwise}}\quad \quad \quad } \\ \end{array} } \right.$$

(9)

From (9), considering the principle of structural risk minimization, the following minimization problem leads to an optimum hyperplane or weights:

$$\begin{gathered} \mathop {\min }\limits_{{\left( {w,\xi_{g}^{ + } ,\xi_{g}^{ - } } \right)}} \,\,\frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{g = 1}^{G} {\left( {\xi_{g}^{ + } + \xi_{g}^{ - } } \right)} \hfill \\ {\text{subject}}\,{\text{to}}\left\{ {\begin{array}{*{20}c} { - f\left( {B_{g} } \right) + T_{g} + \varepsilon + \xi_{g}^{ + } \ge 0} & {\forall g{,}} \\ {f\left( {B_{g} } \right) - T_{g} + \varepsilon + \xi_{g}^{ - } \ge 0} & {\forall g{,}} \\ {\xi_{g}^{ - } \ge 0{,}\xi_{g}^{ + } \ge 0} & {\forall g} \\ \end{array} } \right. \hfill \\ \end{gathered}$$

(10)

Because of the complexity of the above model and the need to minimize the bias term (b) in the objective function, the corresponding dual form of (10) is often used in the SVR training instead of the primal model in (10). The new dual optimization problem of which the Karush–Kuhn–Tucker (KKT) conditions are utilized in the constraints is defined as follows:

$$\begin{gathered} \mathop {\min }\limits_{{\left( {\alpha_{g}^{ + } ,\alpha_{g}^{ - } } \right)}} \,\,\frac{1}{2}\left( {\alpha^{\prime}H\alpha } \right) + \tilde{q}\alpha \hfill \\ {\text{subject}}\,{\text{to}}\left\{ \begin{gathered} \sum\limits_{g = 1}^{G} {\left( {\alpha_{g}^{ + } - \alpha_{g}^{ - } } \right) = 0{,}} \hfill \\ 0 \le \alpha_{g}^{ + } \le C\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall g{,} \hfill \\ 0 \le \alpha_{g}^{ - } \le C\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall g{,} \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered}$$

(11)

where $\alpha = \left[ {\alpha_{1}^{ + } ,\alpha_{2}^{ + } ,...,\alpha_{G}^{ + } ,\alpha_{1}^{ - } ,\alpha_{2}^{ - } ,...,\alpha_{G}^{ - } } \right]^{{}}$ is a 2G × 1 vector and $\tilde{q}$ = [− T₁ + ε, − T₂ + ε, …,− T_G + ε, T₁ + ε, T₂ + ε, …, T_G + ε] is a 1 × 2G vector. Also, $H = \left( {\begin{array}{*{20}c} h & { - h} \\ { - h} & h \\ \end{array} } \right)$ is a 2G × 2G matrix of kernel transformation with h(a,b) = ϕ(B(a),B(b)). Hence, the above optimization problem in (11) has 2G variables including α = [α₁⁺, α₂⁺,…, α_G⁺, α₁⁻, α₂⁻,…, α_G⁻]^T. The quadratic programming algorithms such as kernel adatron (KA), sequential minimal optimization (SMO), iterative single data algorithm (ISDA) and so forth are used to solve the above problem. Then, the weights and bias, or equivalently, the estimation of each observation are obtained using the following relations:

$$\begin{aligned} S = & \left\{ {g|0 < \alpha_{g}^{ + } - \alpha_{g}^{ - } < C} \right\}, \\ b = & \sum\limits_{s \in S}^{{}} {T_{s} - \left( {\sum\limits_{s \in S} {\left( {\alpha_{s}^{ + } - \alpha_{s}^{ - } } \right)\phi \left( {B_{g} ,B_{s} } \right) - \left( {\varepsilon \times sign\left( {\alpha_{s}^{ + } - \alpha_{s}^{ - } } \right)} \right)} } \right),} \\ f\left( {B_{g} } \right) = & \sum\limits_{s \in S}^{{}} {\left( {\alpha_{s}^{ + } - \alpha_{s}^{ - } } \right)\phi \left( {B_{g} ,B_{s} } \right) + b} , \\ g = & 1,2,...,G. \\ \end{aligned}$$

(12)

In (12), S is the support vector set which is usually a small subset of vectors from the training dataset. The magnitude of S or the number of support vectors still depends on hyperparameters including C, ε, kernel function and the structure of the problem. It adjusts the model accuracy by a trade-off between a high-complexity model (increasing the number of support vectors) which may over-fit the data and a large-margin (decreasing the number of support vectors) which will incorrectly classify some of the training data in the interest of better generalization.

2.4 Evolutionary SVR

The combination of meta-heuristic and evolutionary algorithms (EA) with machine learning techniques has received a great deal of attention in the past decade. The main aim of this hybridisation is to use EA in the training or parameter tuning of a machine learning technique (Ojha et al. 2017). As a pioneer work, Kim and Cho (2008) proposed an evolutionary neural network based on the genetic (GA) algorithm in a way that a speciation-based model was established through fitness sharing and then ANN was incorporated by a behaviour knowledge space method. Due to promising results, several other researches have investigated the performance of different EAs such as extended marine predators algorithm (EMPA), gradient-based optimization (GBO), moth-flame optimization (MFO) and water cycle optimization algorithm (WCA) (Adnan et al. 2021a; Ikram et al. 2022a; Kadkhodazadeh and Farzin 2022). The integration of some EA has been recently extended in the literature; for example, Adnan et al. (2021b) implemented the combination of PSO and grey wolf optimization (GWO) algorithms in the training of extreme learning machine (ELM) technique.

Similarly, the paradigm evolutionary SVR (ESVR) refers to the condition of hybridization between EA and SVR. This has been well-received in the literature and can be categorized in three different groups. The first group used EA in the hyperparameter optimization of SVR, see for instance, Adnan et al. (2022), Wang and Du (2014), Ikram et al. (2022b) and Al-Zoubi et al. (2021). In their studies, the optimum values for C, ε and kernel function are acquired with different EAs. In the second group, the EAs were performed with feature selection in combination with parameter optimization in the SVR training (Al-Zoubi et al. 2018; Ziani et al. 2017). In accordance with this paper’s objective, in the third group, some researchers have used either the primal or dual problem (see, Eqs. (10) and (11)), with EAs instead of common quadratic problem solvers. However, Arana-Daniel et al. (2016), Zhang et al. (2016) and Dantas Dias and Rocha Neto (2017) used EAs entailing GA, differential evolution (DE), PSO and simulated annealing (SA) in the support vectors identification. One may ask about the preference or application of EAs in comparison with common quadratic solvers. Arana-Daniel et al. (2016) and Dantas Dias and Rocha Neto (2017) reported that the EAs have less computational complexity and are easier to implement than other techniques. With that being said, it is important to note that there is no definite solution to this question and it depends on the nature of the problem.

2.5 PSO algorithm

Note that EA is used in the third group of the ESVR mentioned in the previous subsection. In other words, solving of the dual problem defined in (11) is performed with PSO as one of the versatile algorithms. By questing the related literature, it was revealed that GA and PSO are the most widespread approaches in this application, with PSO more compatible than GA with continuous variables; for example, the reasonable accuracy of PSO in comparison with some other EAs has been reported in Dantas Dias and Rocha Neto (2017). Also, our simulations reveal that PSO made better accuracy than some other common EA techniques (a brief part of the results are illustrated in the sensitivity analysis section). Therefore, in this paper, PSO is used for the solving of SVR optimization problem.

The PSO idea was extracted from migration of a crowd of birds, where the individual knowledge and performance are determined based on the whole population. As a general procedure in metaheuristic algorithms, the best solution is acquired by generation of supreme solutions from a specific population. A specific location and velocity which is determined based on its own best solution, the global current best solution, and some random parameters are considered for each candidate solution (sometimes called particle) in the PSO algorithm. By reaching the best location and velocity, PSO has been made an excellent evolutionary algorithm in continuous variables or nonlinear optimization problem. For more details about the PSO updating relations, the readers are referred to Kennedy (2010). It is inevitable to assign some hyper parameters in the PSO algorithm including (i) the population size, denoted by npopPSO here, (ii) iteration numbers (maxItPSO), (iii) and (iv) two coefficients for computation of the difference between the current and best position of this solution and best position of all solutions, respectively.

Suppose our proposed method denoted by ESVR hereafter is available after the training procedure. Generally, some features or characteristics of a generated profile are extracted and imported to the ESVR and the condition of the process as an IC or OOC is identified by its output. In this paper, the ESVR output denoted by O_j is compared with a predefined cutting value (CV) which is considered as upper control limit (UCL) in common control charts. By our training procedure, it is not needed to define the lower control limit (LCL); this adaptation is consistent with some previous works; for example see Hosseinifard et al. (2011) in which they adjusted LCL at 0. For better understanding, Fig. 1 depicts the conceptual model of ESVR for deciding about the process condition.

The determination of input features would be the next key step; however, not much attention was given to it in the previous machine learning based control charts. For example, the estimation of parameters were only imported as the input features in Hosseinifard et al. (2011); note though, adding other input features would be significant for better detection ability of OOC profiles in general. The proper input features not only should be indicator of the process parameters properties but also have to embrace the effect of the former samples in the contemporaneous statistic. As one of the contributions of this study, the input features of the jth generated profile consist of four major groups:

The normalized estimated parameters: After estimation of the parameters by IWLS algorithm and considering a p-dimensional normal distribution as $\mathop {{\beta}} \limits^{\frown }_{j} \sim N_{p} \left( {{\varvec{\beta}}_{0} ,\left( {\tilde{{{\varvec{X}}^{\prime } }}W\tilde{{\varvec{X}}}} \right)^{ - 1} } \right)$ in Yeh et al. (2009), the estimation of the parameters are scaled through (13) (more details are provided in Sect. 4 .2 of Johnson and Wichern (2007):
$$\begin{gathered} {\varvec{\beta}} _{{\varvec{j}}}^{\prime } = \left( {\tilde{{\varvec{X}}}^{\prime } W\tilde{{\varvec{X}}}} \right)^{{ - \frac{1}{2}}} \times \left( {\mathop {{\varvec{\beta}} _{{\varvec{j}}} }\limits^{\frown } - {\varvec{\beta}} _{0} } \right)^{T} , \hfill \\ {\varvec{\beta}} _{{\varvec{j}}}^{\prime } = \left( {{{\beta}} _{{1j}}^{\prime } ,\beta _{{2j}}^{\prime } ,...,\beta _{{pj}}^{\prime } } \right). \hfill \\ \end{gathered}$$
(13)
The normalized average of responses parameters: Considering (2), we have y_ij $\sim$ Poisson(µ_ij) for i = 1,2,…,n. Although the exact distribution of the average of the responses in the jth profile is not known, the central limit theorem enables us to scale it as:
$$\overline{y}_{j}^{\prime } = \frac{{\overline{y}_{j} - \frac{{\sum\limits_{i = 1}^{n} {\mu_{0i} } }}{n}}}{{\frac{{\sqrt {\sum\limits_{i = 1}^{n} {\mu_{0i} } } }}{n}}},$$
(14)
in such a way that
$$\overline{y}_{j} = \frac{{\sum\limits_{i = 1}^{n} {y_{ij} } }}{n},$$
(15)
where log(µ₀) and µ₀ have been defined after (3). It is better to import the above parameters as an EWMA form; in other words, an EWMA form of p + 1 parameters $\left( {EWMA_{Pj} = \left[ {\beta_{1j}^{\prime } ,\beta_{2j}^{\prime } ,...,\beta_{pj}^{\prime } ,\overline{y}_{j}^{\prime } } \right]} \right)$ are computed in each generated profile as in (4) with the initial values [0,0,…,0]_p+1.
The ratio of MEWMA statistics: The better detection ability of the runs-rules monitoring scheme proposed by Yeganeh and Shadman (2020) and Yeganeh et al. (2021) led us to adopt a similar approach in this paper. They applied the ratio of points as a supplementary tool to increase the chart’s performance. Because of the complexity in the design of run-rules schemes, it is recommended to use the ratio of points in the input features. To this end, the UCL_MEWMA are obtained by specifying a desired ARL₀ considering only the MEWMA chart (this has been reported in Table 2 of Qi et al. (2016) for ARL₀ equal to 370). Then, MEWMA statistics are computed using (5) in the jth profile and the number of samples in the three different regions entailing $\left( {0,\frac{{UCL_{MEWMA} }}{2}} \right)$, $\left( {\frac{{UCL_{MEWMA} }}{2},UCL_{MEWMA} } \right)$ and beyond the control limits $\left( {UCL_{MEWMA} , + \infty } \right)$ which are each denoted as $d_{MEWMA}^{(1)}$, $d_{MEWMA}^{(2)}$ and $d_{MEWMA}^{(3)}$, respectively. The 1 × 4 vector $\left( {\left[ {\frac{{d_{MEWMA}^{(1)} }}{j},\frac{{d_{MEWMA}^{(2)} }}{j},\frac{{d_{MEWMA}^{(3)} }}{j},M_{j} } \right]} \right)$ is imported to the ESVR to incorporate the effect of previous samples, in similar fashion as run-rules.
The ratio of LRT statistics: Because LRT chart has superior performance than MEWMA chart for detection of large shifts of Poisson profiles (see for example, Table 3 of Qi et al. (2016)), this statistic is also added in the input features by definition of UCL_LRT. Hence, using a similar approach as in the previous point and using (6), the 1 × 4 vector $\left( {\left[ {\frac{{d_{LRT}^{(1)} }}{j},\frac{{d_{LRT}^{(2)} }}{j},\frac{{d_{LRT}^{(3)} }}{j},LRT_{j} } \right]} \right)$ is computed and imported to the ESVR.

By these definitions, ESVR has a (p + 1 + 4 + 4)-dimensional input vector $\left( {I_{j} = \left[ {EWMA_{Pj} ,\frac{{d_{MEWMA}^{(1)} }}{j},\frac{{d_{MEWMA}^{(2)} }}{j},\frac{{d_{MEWMA}^{(3)} }}{j},M_{j} ,\frac{{d_{LRT}^{(1)} }}{j},\frac{{d_{LRT}^{(2)} }}{j},\frac{{d_{LRT}^{(3)} }}{j},LRT_{j} } \right]} \right)$. Many investigations were conducted to reach the above four groups which are the best input combinations for reaching minimum ARL₁. It is discussed more in the sensitivity analysis by comparing with other ESVRs that have different input structures.

Also, there are some interesting results about the proposed (p + 9)-dimensional input vector (i.e., I_j) of ESVR that were obtained based on simulations (these results are not reported here to conserve space). First, to estimate W defined after (3), it has been suggested to consider the current profile information instead of the IC values as it was proposed by a number of researchers (see, e.g. Huwang et al. (2016)); whereas, the simulations study revealed better results regarding the IC values; hence, we computed W with IC model instead of current profiles. The same can be said about the estimation of $\overline{y}_{j}^{\prime }$ in (14). The second point is about the ratio of samples beyond the control limits $\left( {\frac{{d_{MEWMA}^{(3)} }}{j},\frac{{d_{LRT}^{(3)} }}{j}} \right)$. The main reason of including it in the proposed method is to improve the robustness of ESVR to large simultaneous shifts. In other words, the effect of MEWMA ratios $\left( {\frac{{d_{MEWMA}^{(1)} }}{j},\frac{{d_{MEWMA}^{(2)} }}{j}} \right)$, LRT ratio shifts $\left( {\frac{{d_{LRT}^{(1)} }}{j},\frac{{d_{LRT}^{(2)} }}{j}} \right)$, and beyond the control limit ratios $\left( {\frac{{d_{MEWMA}^{(3)} }}{j},\frac{{d_{LRT}^{(3)} }}{j}} \right)$ are present in small, large single and large simultaneous shifts, respectively. As a third point, one may suggest to use WLRT statistics instead of LRT or MEWMA because of its superiority in performance. This is also possible in this paper and may improve the results; but, WLRT needs more computational time because of its full use of all available samples up to the current time point j, especially for the detection of small shifts.

To conduct the simulations, a (p + 9)-dimensional input vector of ESVR (I_j) is computed in each generated profile and imported to the ESVR. Then, O_j is compared with CV and an OOC signal is triggered when CV < O_j (see Fig. 1). To compute the ARL₁ and SDRL₁, this procedure iterates in several Monte Carlo simulations and the signalling times are stored as the run lengths (RL). For a better illustration, the process of obtaining ARL₁ and SDRL₁ for a desired shift with MaxIt iterations is illustrated in Pseudocode 1.

3 Training of the proposed method

In the previous section, it has been assumed that the ESVR has been trained. To train it, a G × (p + 10) training dataset similar to the one in Hosseinifard et al. (2011) is generated. To this end, 0.5G IC profiles and 0.5G OOC profiles (with some desired shifts) are generated and a (p + 9)-dimensional input vector is computed for each generated profile. The target values for IC and OOC profiles are 0 and 1, respectively, or $T_{1} = T_{2} = ... = T_{\frac{G}{2}} = 0;T_{{1 + \frac{G}{2}}} = T_{{2 + \frac{G}{2}}} = ... = T_{G} = 1$.

As mentioned previously, the dual problem in (11) is solved using PSO algorithm and the optimum (minimum) value of objective function is reached by assigning the values of existing 2G variables $\left( {\alpha = \left[ {\alpha_{1}^{ + } ,\alpha_{2}^{ + } ,...,\alpha_{G}^{ + } ,\alpha_{1}^{ - } ,\alpha_{2}^{ - } ,...,\alpha_{G}^{ - } } \right]} \right)$. But, as one of the contributions of this paper, some scanty changes are added to the objective function of (11). This model was introduced by Zhang et al. (2016) in which an additional coefficient is added to objective function of primal problem.

Before the identification of the additional terms to objective function, it is better to be aware of the challenge on the relation between common accuracy criteria entailing mean square error (MSE) and ARL when designing control charts based on machine learning techniques. In a usual ANN, SVM, SVR and other machine learning techniques, the training process continues until reaching a desired threshold; whereas in the case of classical Phase II, control charts are evaluated in terms of the ARL and there is no direct relationship between the two approaches. This challenge has been discussed in details by Yeganeh and Shadman (2020) who suggested heuristic solution similar to the design of experiment approach for ANN training which is not the focus in this paper.

Since the 0 and 1 values have been assigned for the IC and OOC profiles as the target values, respectively, and the process condition is identified through the CV, it was observed that the higher performance (i.e., lower ARL₁ for a desired ARL₀) occurs when the difference between OOC and IC estimated target values is at the maximum value. In other words, in a common situation, the outputs of ANN or SVR tend towards 0 and 1 in IC and OOC profiles and the CV is obtained closer to 1 to reach a desired ARL₀ (see Hosseinifard et al. (2011)) while the higher the difference between the outputs, the lower the ARL₁ value. Thus, some criteria are needed to depict the significance of the difference between IC and OOC ESVR outputs.

To this end, the output of each input in the dataset is obtained using (12). Suppose that it is denoted by $\hat{T}_{g} ;g = 1,2,...,G$ which is equivalent to f(B_g) given in (12) where $g = 1,2,...,\frac{G}{2}$ are the predicted IC values and others are for OOC profiles. Therefore, the dual problem can be revised as follows:

$$\begin{gathered} \mathop {\min }\limits_{{\left( {\alpha_{g}^{ + } ,\alpha_{g}^{ - } } \right)}} \,\,MSE + DAVE + DR \hfill \\ {\text{subject}}\,{\text{to}}\left\{ \begin{gathered} \sum\limits_{g = 1}^{G} {\left( {\alpha_{g}^{ + } - \alpha_{g}^{ - } } \right) = 0{,}} \hfill \\ 0 \le \alpha_{g}^{ + } \le C\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall g{,} \hfill \\ 0 \le \alpha_{g}^{ - } \le C\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\forall g{,} \hfill \\ \end{gathered} \right. \hfill \\ MSE = \frac{{\sum\limits_{g = 1}^{G} {\left( {T_{g} - \hat{T}_{g} } \right)^{2} } }}{G}{,} \hfill \\ DAVE = \frac{{\sum\limits_{g = 1}^{\frac{G}{2}} {\left( {\hat{T}_{g} } \right)} }}{\frac{G}{2}} - \frac{{\sum\limits_{{g = 1 + \frac{G}{2}}}^{G} {\left( {\hat{T}_{g} } \right)} }}{\frac{G}{2}}{,} \hfill \\ DR = \mathop {range}\limits_{{g = 1,2,...,\frac{G}{2}}} \left( {\hat{T}_{g} } \right)\,\,\, - \mathop {range}\limits_{{g = 1 + \frac{G}{2},2 + \frac{G}{2},...,G}} \left( {\hat{T}_{g} } \right){.} \hfill \\ \end{gathered}$$

(16)

The proposed objective function consists of three components. The first one is the common MSE criteria which is frequently utilized in machine learning applications but as mentioned, it cannot lead to minimum ARL₁; hence, the two other components entailing the difference of averages (DAVE) and the difference between the range (i.e., maximum–minimum) of outputs (DR) are appended to the objective function. It is proved in simulation studies that the proposed training approach converges to a solution with minimum ARL₁ values or equivalently, quicker OOC detection ability. Based on our simulations, the effect of these is that the IC and OOC outputs have the maximum difference which will lead to minimum ARL₁ while MSE scales the outputs and hinders increasing the output values. Note that, the obtained CV can be increased above 1 to reach a specific ARL₀.

The ideal values of MSE, DAVE and DR values are 0, − 1 and − 1, respectively, when $\hat{T}_{1} = \hat{T}_{2} = ... = \hat{T}_{\frac{G}{2}} = 0$;$\hat{T}_{{1 + \frac{G}{2}}} = \hat{T}_{{2 + \frac{G}{2}}} = ... = \hat{T}_{G} = 1$; thus, the best value of the proposed objective function is -2. Note that the first term of primary objective function in (11) (i.e. $0.5(\alpha^{\prime}H\alpha ) + \tilde{q}\alpha$) may also be included in (16); however, this is not recommended since it is a redundant term with the same outcome in this condition and it only makes it more complex.

Considering a training dataset with G elements, the optimization problem in (16) with 2G variables should be solved by PSO. To this end, initial solutions with size npopPSO are randomly generated and updated with PSO algorithm in a way that each member of the population is a vector with size 2G. By changing the variables’ values $\left( {\alpha_{g}^{ + } ,\alpha_{g}^{ - } } \right)$ in (16) during the PSO implementation, the output of each input $\left( {\hat{T}_{g} ;g = 1,2,...,G} \right)$ is computed by (12); in other words, function evaluation process (obtaining objective function) is carried out by computation of $\hat{T}_{g}$ for g = 1, 2,…, G. The process terminates when the objective function reaches its ideal value or BestSol (i.e., − 2) or the iteration number exceeds maxItPSO. The framework of ESVR training is illustrated in Pseudocode 2.

After training the ESVR, the CV value is adjusted such that the desired ARL₀ is reached. It is done using the algorithm provided in Pseudocode 1, on the condition that there are no shifts in the profile generation. Note that UCL_MEWMA and UCL_LRT are constant during the training.

4 Performance comparisons

Motivated by Qi et al. (2016) and Shang et al. (2018), three different OOC situations entailing parametric model with fixed explanatory variable, parametric model with nonfixed explanatory variable and non-parametric model have been simulated in this section to evaluate the performance of the proposed approach (the competitors’ results are extracted from the above mentioned references). The model parameters are provided in Table 1.

Table 1 The preassigned model parameters

Full size table

Based on the Qi et al. (2016), the IC model for the fixed and random design points was assumed as:

$$\begin{gathered} {\varvec{\beta}}_{{\varvec{0}}} = [1\,\,1]{,} \hfill \\ \tilde{\varvec{X}^{\prime}} = \left( {\begin{array}{*{20}c} 1 & 1 & \cdots & 1 \\ {0.1} & {0.2} & \cdots & 1 \\ \end{array} } \right){,} \hfill \\ n = 10{,}p = 2. \hfill \\ \end{gathered}$$

(17)

The OOC profiles parameters (denoted as β_OOC) were generated such that

$$\begin{gathered} {\varvec{\beta}}_{{{\varvec{OOC}}}} = {\varvec{\beta}}_{{\varvec{0}}} + \Delta {,} \hfill \\ \sigma_{1} = 0.3518{,}\sigma_{2} = 0.5095{,} \hfill \\ \end{gathered}$$

(18)

where $\Delta = \left( {\delta_{1} \sigma_{1} ,\delta_{2} \sigma_{2} } \right)$ represents the magnitude of shifts in terms of standard deviation. To generate training dataset, in addition to 1200 IC profiles, three different conditions with 400 OOC profiles with the magnitude of shifts δ₁ = 0.2, δ₂ = 0.2 and δ₁ = δ₂ = 0.2 were generated when G = 2400.

4.1 ARL₁ values for the fixed design points condition

Four competitors (namely: LRT, MEWMA, LRT-EWMA and WLRT schemes) were compared with the proposed ESVR method. After the training procedure, the CV was set equal to 2.12 by simulation to reach ARL₀ = 370 (i.e., implementation of Pseudocode 1’s procedure with no shift). Table 2 reports the values of ARL₁ and SDRL₁ in parentheses at different shift magnitudes. It is noteworthy to mention that the boldfaced values denote the best performing scheme.

Table 2 Comparison of ARL₁ (SDRL₁) for the Poisson profiles with fixed design points

Full size table

The proposed ESVR scheme yielded lower values of ARL₁ and SDRL₁ regardless of the size of the shifts which gave an advantage to ESVR over other competitors. Tangible reduction in the values of ARL₁ and SDRL₁ can be seen for most of the shifts; for example, the ARL₁ in the first row were obtained as 30.1, 44.8, 153, 201 and 365 for ESVR, WLRT, LRT-EWMA, LRT and MEWMA schemes, respectively. Comparable performance in a wide range of shifts indicated that the training procedure of the ESVR would be very good as it made ESVR a robust control chart over different types of shifts. In other words, although only one shift magnitude has been taught to the ESVR during the training, it detected other OOC shifts as quick as possible.

As another finding, the MEWMA and LRT schemes have weaker performances than the WLRT for most of the shifts, which reveals that the combination of two or more control charts (such as combination of EWMA and LRT for construction of WLRT and LRT-EWMA) could increase the performances of the resulting methods (Qi et al. 2017). It is more evident that the WLRT performs much better, especially for small and moderate shifts; for instance, when (δ₁ = 0.2, δ₂ = 0) the ARL₁ and SDRL₁ of the WLRT are 5 times less than that of the LRT. This idea can be extended to the ESVR schemes to increase their performances. Based on this fact, we can conclude that one of the main reasons for the superior performance of ESVR could be the combination of the LRT and MEWMA statistics.

4.2 ARL₁ values for the random design points condition

Similar to Qi et al. (2016), random explanatory variables with n = 9 were generated using the same IC model. To this end, a random number from a discrete uniform distribution over the integers from 1 to 10 were selected and afterwards deleted to construct a random design point with n = 9. To make a more robust scheme, a new ESVR was not trained in this case and only the data generation procedure was changed based on the random design points. The results of this case are gathered in Table 3.

Table 3 Comparison of ARL₁ (SDRL₁) for the Poisson profiles with random design points

Full size table

Since the same control limits from the previous subsection were used, the ARL₀ is not exactly equal to 370 (the results of IC situation are shown in the first row). The OOC results revealed the superiority of the newly proposed ESVR in case of random design points over other competitors with similar properties as the one for fixed design points. Regardless of the size of the shift, the ESVR performs better followed by the WLRT chart in terms of the ARL values. On the other hand, the performance in term of SDRL was higher as compared to that of its competitors for moderate and large shifts; while, the WLRT performed better among all competing charts for small shifts. We can refer, for instance, to the shift with magnitude (δ₁ = 0.2, δ₁ = 0) in which the minimum SDRL₁ i.e., 45.2 was related to WLRT. The same conclusions can be drawn for smaller shifts. Shang et al. (2011) and Song et al. (2021) hinted to complexity of motioning profiles with random design points. Comparing the results of Tables 2 and 3, it can be seen that both ARL₁ and SDRL₁ are much larger for a random design, which confirms Shang et al. (2011) and Song et al. (2021)’ s findings.

4.3 ARL₁ values for the non-parametric condition

Non-parametric monitoring refers to the OOC conditions where the type of OOC profile is not known and the IC model can transform to any possible shape. Due to whole changing in the profile relationship, the OOC situation is usually denoted with some scenarios in non-parametric conditions (Zou et al. 2008; Shang et al. 2018; Abbasi et al. 2022). Note that there is no research in non-parametric Poisson profiles so all the values have been achieved by our simulations. To simulate this situation, two different OOC scenarios were investigated with the fixed design points and the IC model of (17). Equations (19) and (20) represent the OOC model in each scenario. Equation (19) presents the OOC model of scenario I.

$$\begin{gathered} {\varvec{Y}}_{{\varvec{j}}} = Poisson\left( {{\varvec{\mu}}_{{\varvec{j}}} } \right){,} \hfill \\ j = 1{,}2{,}....{,} \hfill \\ log\left( {{\varvec{\mu}}_{{\varvec{j}}} } \right) = \tilde{\varvec{X}\beta }_{{\varvec{j}}} \varvec{ + }\delta_{3} {\varvec{cos}}\left( {{\varvec{2}}\pi \tilde{\varvec{X}^{\prime}}_{pure} } \right){,} \hfill \\ \end{gathered}$$

(19)

where $\tilde{\varvec{X}^{\prime}}_{pure}$ is the indicator of the explanatory variables without intercept term or $\tilde{\varvec{X}^{\prime}}_{pure} = \left( {\begin{array}{*{20}c} {0.1} & {0.2} & \cdots & 1 \\ \end{array} } \right)$. The results of ARL₁ (SDRL₁) for this scenario are displayed in Table 4.

Table 4 Comparison of ARL₁ (SDRL₁) for the non-parametric Poisson profiles in scenario I

Full size table

Equation (20) provides the OOC model of scenario II. The results of ARL₁ (SDRL₁) for this scenario are displayed in Table 5.

$$\begin{gathered} {\varvec{Y}}_{{\varvec{j}}} = Poisson\left( {{\varvec{\mu}}_{{\varvec{j}}} } \right){,} \hfill \\ j = 1{,}2{,}....{,} \hfill \\ log\left( {{\varvec{\mu}}_{{\varvec{j}}} } \right) = \tilde{\varvec{X}\beta }_{{\varvec{j}}} \varvec{ + }\frac{{\delta_{3} }}{{\tilde{\varvec{X}^{\prime}}_{pure} }}{,} \hfill \\ \tilde{\varvec{X}^{\prime}}_{pure} = \left( {\begin{array}{*{20}c} {0.1} & {0.2} & \cdots & 1 \\ \end{array} } \right){.} \hfill \\ \end{gathered}$$

(20)

Table 5 Comparison of ARL₁ (SDRL₁) for the non-parametric Poisson profiles in scenario II

Full size table

Comparing two scenarios, sooner OOC detection was raised in the first scenario for nearly all control charts; for example, the ARL₁ (SDRL₁) for LRT scheme were 138.0 (127.7) and 228.9 (222.4), respectively, in each scenario. It indicates that the cyclic patterns based on the Eq. (19) can be detected easier in comparison with OOC term in Eq. (20). The ESVR scheme turned out to be the best among all other competing schemes. In general, compared with the LRT and MEWMA schemes, the ESVR scheme has both robustness and sensitivity to the complete changes in the profile type for different shifts; whereas, the MEWMA has lower SDRL₁ for small shifts of scenario I. That said, its performance was not as comparable as in the second scenario, which is an indicator of lack of robustness in terms the size of the shift. One of the main reasons for this phenomenon is that most of the existing statistical control charts have been extended based on some fundamental assumptions about the properties of the process (Montgomery 2019; Gupta et al. 2006). While the process condition completely follows these assumptions, the statistical control charts can perform well but it deteriorates its detection ability in other situations such as non-parametric models, complex relation forms and so forth. In these conditions, several researches have stated that machine learning techniques could be superior than statistical methods and had more robustness (Yeganah et al. 2022a; b; Pacella and Semeraro 2011; Mohammadzadeh et al. 2021; Chen et al. 2020). As it was expected that ESVR scheme, as a machine learning technique, would outperform the statistical approaches due to its nature, the lowest values of ARL₁ and SDRL₁ have belonged to it.

5 Sensitivity analysis

This section provides six different sensitivity analyses. First, the effect of the proposed input structure and training algorithm against other machine learning techniques are evaluated. Secondly, the detection ability with other desired ARL₀ is reported. Thirdly, a sensitivity analysis for different n values is performed. In the fourth part, the detection ability is increased with some run-rules. The effect of PSO in training of ESVR is investigated in the fifth part and finally, the best performance of the proposed input structure is depicted on the last subsection. All the simulations have the same setups as in Sect. 5.1.

5.1 ARL₁ comparisons under different machine learning techniques

To show the capability of the proposed input layer and training method, four different scenarios were designed with ANN and usual SVRs. In the first state, a common ANN with back-propagation algorithm called ANN-BP1 was trained similar with Hosseinifard et al. (2011) i.e. the inputs were estimation of coefficients. Then, the proposed input layer structure was also trained in a similar way with ESVR (i.e., assigning 11 neurons in input layer) called ANN-BP2. Both techniques have been trained with ‘feedforwardnet’ function in MATLAB 2018 and have two hidden layers. Moreover, two SVRs called SVR1 and SVR2 have been trained with similar inputs with ANN-BP1 and ANN-BP2 with ‘fitrsvm’ function. By these adjustments, ANN-BP1 and SVR1 assessed the proposed input structure while the performance of training method was evaluated under ANN-BP2 and SVR2. The results of these setups are displayed in Table 6.

Table 6 Comparison of ARL₁ (SDRL₁) for different machine learning schemes in fixed design points

Full size table

As it can be seen, the ESVR outperformed ANN-BP1, ANNBP2 and SVR2 for most of the shifts; whereas, for small single positive shifts, it had weaker performance than SVR1. Although the ESVR had lower ARL₁ values than SVR1 for most of the shifts, the simple training of SVR1 might violate the efficiency of ESVR. By more concentration in ARL₁ of different shifts including negative shift and simultaneous positive and negative shifts (the results were not shown), it was revealed that SVR1 came across with the bias effect which means the inability of detecting some shifts (this is also shown for ANN-BP2 in the last two rows of Table 6). As pointed out by Huwang et al. (2014), machine learning techniques suffers from bias effect which means that for such control charts, the OOC signals are not triggered for some shifts; thus, some remedial actions should be considered. However, this bias and poor detection ability did not exist in the case of ESVR scheme.

5.2 ARL₁ comparisons under an ARL₀ value of 200

In Fig. 2, the simulation adjustments were carried out to have ARL₀ = 200 under the fixed design point and the ARL₁ and SDRL₁ are reported in Panels (a) and (b), respectively. Figures 2a, b illustrate that the ESVR scheme remains superior in the new condition, which reveals the apparent complete robustness of ESVR scheme in terms of the type I error and/or ARL₀ (Abbas et al. 2016). The comparative analysis in Fig. 2 remains valid for other different values of ARL₀ but they are not reported for the sake of brevity. Note that the results were obtained using the previous ESVR training where the CV was decreased to reach the desired ARL₀ value of 200.

5.3 ARL₁ Comparisons under different n values

To study the effect of different sample sizes with ARL₀ = 200, we also set n = 5, 15 and 20 with the step 0.1; for example, $\tilde{\varvec{X}^{\prime}}_{pure}$ = (0.1 0.2 … 1.5) when n = 15. Figures 3a, b illustrate some comparisons for different values of n in terms of ARL₁ and SDRL₁, respectively. The results conducted without a new training were predictable in SDRL₁ while the ARL₁ values had outstanding pattern for some of the moderate shifts because it was expected that the greater the value of n, the lower the ARL₁ for a specific shift. Another strange pattern occurred in the third and fourth shift when n = 5 (blue line). In this state, the larger shifts had larger ARL₁ and SDRL₁. It may be because of small sample sizes and biased estimated parameters. It was mentioned by Montgomery (2019), Haq (2020) and Abbasi et al. (2022) that detection of OOC condition in process with variable sample size provides some challenges such as variance inflation and it is necessary to employ some adaptive schemes for statistical control charts to reduce this effect. The results indicated that ESVR can perform better as an adaptive scheme in case of variable sample size. Conceptually speaking, the value of n is known in Phase II so control charts are extended with permanent n values and it is more practical to use ESVR in this situation with specific training for each n. However, this is not the focus of this paper, due to brevity and will be reported in the future studies.

5.4 Increasing the detection ability with run-rules

To improve the sensitivity of control charts, many techniques (or methods) including runs-rules, adaptive methods variable sampling designs and mixed procedures have been recommended in the area of profile monitoring (Haq 2020; Mohammadzadeh et al. 2021). For instance, adding runs-rules to the basic control chart can increase its ability to quickly detect shifts of different magnitudes. Employing of variable sampling interval (VSI) technique in which it is allowed to take samples with shorter intervals in case existing potential for a shift in IC model, while samples are taken with longer interval in the routine situations, has been utilised in the area of profile monitoring. Also, some other adaptive methods including modified successive sampling, ranked based and set approaches have been considered in profile monitoring (Maleki et al. 2018; Woodall 2007) but as a more profitable technique, run-rules was extended by Yeganeh et al. (2021), Yeganeh and Shadman (2020) and Yeganeh and Shadman (2021) in which they were used in form of a rule matrix. To construct a rule matrix, the IC region is divided into several regions by a heuristic approach. After defining these, then the number of regions and the ratio of points in them are reconciled with prespecified values (i.e., thresholds) and an OOC signal is obtained when they locate beyond these thresholds.

To investigate its effect in combination of ESVR, in this paper, the proposed ESVR method supplemented with rule matrix (denoted as ESVR-RULE). For this aim, the ratio of ESVR statistic (O_j) in the run-rule regions was computed and compared with the limits of rule matrix. The details of rule matrix were not given for the sake of brevity. For comparison purposes, the combination of rule matrix and MEWMA scheme (denoted with MEWMA-RULE) was also provided. Figures 4a, b illustrate the results of ESVR, ESVR-RULE, MEWMA and MEWMA-RULE.

From Fig. 4, it is observed that run-rules could improve the performance of both MEWMA and ESVR in terms of ARL₁ and SDRL₁ under small and moderate shifts except for the IC condition. The SDRL₀ of the MEWMA-RULE and ESVR-RULE were tangibly greater than MEWMA and ESVR, respectively; however, this may cause some problems in increasing the number false alarms in specific conditions. The second finding of this simulation study was that the run-rules could not be effective for large shifts. This finding is rational as run-rules usually improve detection ability of small shifts (Montgomery 2019). As the last finding, the ESVR-RULE performed better than the MEWMA-RULE which revealed a superior detection ability of the proposed method.

5.5 Effect of EA in training of ESVR

As mentioned in the previous section, the optimization problem in (16) is solved with PSO in ESVR. To show the superiority of PSO over other common EAs, three other well-known EA entailing GA, DE and SA was also employed. For this aim, all the designing steps between this method were similar and only they were utilised in solving of Eq. (16). Figure 5 depicts the ARL₁ and SDRL₁ values for our proposed method or ESVR (PSO), DE, GA and SA.

It is obvious that ESVR (PSO) had the best detection ability in all the shifts in terms of ARL₁ while it was also the best approach in small and moderate shifts in terms of SDRL₁. But GA had a very small superiority over ESVR (PSO) in large shifts in term of SDRL₁. Hence, these and some other similar simulations justified choosing of PSO in our proposed method. However, as stated in the literature (Adnan et al. 2021a; Ikram et al. 2022a; Kadkhodazadeh and Farzin 2022), some EAs such as EMPA, GBO, MFO, WCA and GWO may be highly sensitive to the initial parameters and adjustments. Hence, reaching superior performance over PSO may occur with these approaches under some sensitivity analysis. This idea can be investigated in the future by interested researchers.

5.6 Effect of input features

To show the best performance of our proposed input structure $\left( {I_{j} = \left[ { \, EWMA_{Pj} \frac{{d_{MEWMA}^{(1)} }}{j},\frac{{d_{MEWMA}^{(2)} }}{j},\frac{{d_{MEWMA}^{(3)} }}{j},M_{j} ,\frac{{d_{LRT}^{(1)} }}{j},\frac{{d_{LRT}^{(2)} }}{j},\frac{{d_{LRT}^{(3)} }}{j},LRT_{j} } \right]} \right)$, some other input combinations were defined as follows:

ESVR1: $I_{j} = [EWMA_{Pj} ]$.
ESVR2: $I_{j} = \left[ {EWMA_{Pj} ,\frac{{d_{MEWMA}^{(1)} }}{j},\frac{{d_{MEWMA}^{(2)} }}{j},\frac{{d_{MEWMA}^{(3)} }}{j},M_{j} } \right]$.
ESVR3: $I_{j} = \left[ {EWMA_{Pj} ,\frac{{d_{LRT}^{(1)} }}{j},\frac{{d_{LRT}^{(2)} }}{j},\frac{{d_{LRT}^{(3)} }}{j},LRT_{j} } \right]$.
ESVR4: $I_{j} = \left[ {\frac{{d_{MEWMA}^{(1)} }}{j},\frac{{d_{MEWMA}^{(2)} }}{j},\frac{{d_{MEWMA}^{(3)} }}{j},M_{j} } \right]$.
ESVR5: $I_{j} = \left[ {\frac{{d_{LRT}^{(1)} }}{j},\frac{{d_{LRT}^{(2)} }}{j},\frac{{d_{LRT}^{(3)} }}{j},LRT_{j} } \right]$.

The training procedure for each of the above input combinations was the same as ESVR and the difference was only related to the input size. Consequently, the dimension of inputs in the train data was p + 1, p + 5, p + 5, 4 and 4 for ESVR1, ESVR2, ESVR3, ESVR4 and ESVR5 respectively. Figure 6 depicts the performance of ESVR approach regarding to each predefined input combination.

The superiority of ESVR over other input combinations are obvious from Fig. 6. By reduction of each part of input features, we can notice some decreases in the performance of ESVR scheme in term of ARL₁ and SDRL₁. The ability to identify OOC situations are more apparent for large shifts such as δ₁ = δ₂ = 0.59 of which ESVR1 was nearly five (ten) times less quick than ESVR in term of ARL₁ (SDRL₁). As another finding, it could be inferenced that combination control chart statistics and EWMA form of estimated parameters had a strong effect in detection ability due to superiority of ESVR2 and ESVR3 over ESVR1, ESVR4 and ESVR5. It means that the LRT and MEWMA statistics were not solely able to increase detection ability and they required some characteristics of the process to reduce ARL₁ and SDRL₁. The weak performance of ESVR1 utilized only EWMA form of estimated parameters has also confirmed this argument.

6 Diagnosis aid

In some real cases, the practitioner is interested to identify the parameters that have shifted after an OOC signal has been detected; however, this approach called profile diagnosis has gotten scant attention in the literature of profile monitoring. For example, some statistics have been proposed by Zou et al. (2007), Zou et al. (2008) and Huwang et al. (2016) for the diagnosis of the shift causes in linear, non-parametric and logistic profiles, respectively. Yeganeh and Shadman (2020) introduced a different approach using ANN with signalling rules as a tool for profile diagnosis. However, to the best of the authors’ knowledge, there is no research work on diagnosis for Poisson profiles. In this paper, a novel structure based on a set of SVRs is proposed for diagnosis actions in Poisson profiles.

6.1 Requirements for profile diagnosis actions

There are two key points in the profile diagnosis simulations. Firstly, since profile diagnosis is usually implemented after the change point estimation. This is not part of this paper as we have assumed that all the shifts are considered from the onset of the process. Then, the IC estimated profiles need to be removed or ignored (see, for example, Fig. 9 in Yeganeh and Shadman (2020)) so that the machine learning procedure is based only on OOC samples to identify the parameters that have changed. Secondly, for a fair judgement, it is assumed that the control chart has the same signalling method. For example, similar diagnosis techniques can yield different results because of different signalling methods; see for example, Zou et al. (2007) and Huwang et al. (2014). In this paper, the diagnosis actions are implemented after triggering a signal by ESVR control chart.

6.2 The proposed SVR structure in profile diagnosis

Following Yeganeh and Shadman (2020) model, SVR is used in this paper for profile diagnosis actions. Yeganeh and Shadman (2020) used the EWMA statistics with estimated parameters as the inputs of ANN. Due to consideration of previous samples information, the EWMA statistic considers the change point effect automatically. Thus, after an OOC signal, the EWMA_Pj $\left( {EWMA_{Pj} = \left[ {\beta^{\prime}_{1j} ,\beta^{\prime}_{2j} ,...,\beta^{\prime}_{pj} ,\overline{y}^{\prime}_{j} } \right]} \right)$ of the last sample (i.e., the ${j}^{th}$ sample in this case is equal to signalling sample; but hereafter its index is omitted) is considered as the input vector of profile diagnosis. But there is a fundamental difference between ANN and SVR. It is possible to assign different neurons for each parameter in the output layer of ANN while SVR could only generate one output. This is the major challenge that arises when conducting profile diagnosis with SVR as a classification problem. To overcome this problem, we used an approach denoted by SVRS (SVR Set) in a way that one SVR called SVRD (SVR Diagnosis) is trained for each possible change and the existence of each shift is identified by SVRD. In this approach, there are in total p SVRDs in diagnosis process. In other words, SVRS consist of p SVRDs, i.e., SVRD₁, SVRD₂, …, SVRD_p. For example, if we have two parameters in the IC profile, two SVRDs identify the shifted parameters such that the first and second SVRDs are indicators of the shifts in the first and second parameters, respectively. Naturally, the identifications of a shift by both parameters indicate a simultaneous shift.

Also, a limit, called CVD (Cutting Value Diagnosis) is assigned for each SVRD to identify the change in the parameters. To conduct diagnosis actions after an OOC signal by ESVR control chart, the $1+p$ vector EWMA_p is computed using the last (i.e., current) sample and then it is considered as the input of each SVRD. In other words, SVRD₁, SVRD₂, … and SVRD_p have the same input. The outputs of SVRDs are compared with their CVD to identify the shifted parameters.

Considering the IC model given in (17) with p = 2, SVRS includes SVRD₁ and SVRD₂ with CVD₁ and CVD₂ such that the shift in the first and second parameters are identified, respectively. To better illustrate the above diagnosis, Fig. 7 depicts the diagnosis procedure of SVRS after an OOC signal in the jth profile is detected, using the IC model defined in (17) when p = 2.

To train each SVRD in SVRS, some OOC profiles are generated until an OOC signal is obtained by ESVR. Then, the EWMA_p is considered as the inputs of the training dataset. The targets are defined such that the target value of the shift in the pth parameter is 1 and the other values are 0 in SVRD_p. Other training aspects and assigned limits are the same as the ones in Yeganeh and Shadman (2020). Hence, the details are not reported here for brevity.

6.3 The accuracy of the proposed SVRS structure in profile diagnosis

Due to the lack of research for profile diagnosis in Poisson profiles, four other competitors entailing three machine learning techniques and one statistic method are provided in this paper. First, we use multiclass SVM (the details and concept are ignored here to save space) with ‘fitcecoc’ function in MATLAB called MSVM. For a better comparison, ANN training with back-propagation and entropy-based training algorithm (‘feedforward’ and ‘patternnet’ functions in MATLAB) denoted by ANN-BP and Patternnet are also carried out for profile diagnoses. In addition to these methods, the Wald statistic proposed in Huwang et al. (2016) is also computed as the last competitor (denoted as Wald Test). The OOC profiles are generated from (17) with $p=2$. The simulation procedure for obtaining accuracies in profile diagnoses with SVRS when there is shift in the intercept is described by Pseudocode 3.

Note that Pseudocode 3 changes to the situations that OOC shifts have been occurred in slope and simultaneous by replacing the following codes with the last if.

If (output of SVRD₁ < CVD₁ & output of SVRD₂ > CVD₂) % Shift in slope
If (output of SVRD₁ > CVD₁ & output of SVRD₂ > CVD₂) % Shift in intercept and slope

The results of diagnosis accuracy are reported in Table 7 for Poisson profiles based on 10,000 iterations (MaxIt), where the boldfaced values denote the best performing scheme at that particular shift size. For example, the accuracy of SVRS in the first shift is 0.52 which means that “Corrected” value in Pseudocode 3 is 5200. From Table 7, it can be seen that SVRS and Patternnet are preferred over others in terms of the average of accuracies while ANN-BP is the best method in terms of the standard deviation of accuracies. MSVM and Wald Test have biased performances; that is, for some shifts, they are not able to detect any shifts while they have good accuracies for others shifts.

Table 7 Profile diagnosis accuracy

Full size table

7 Illustrative example

A real-life application of Poisson profiles in the airline industry is provided here from Chatterjee and Hadi (2013) and Alevizakos et al. (2019b). The aim of this example is to check the relationship between the number of injury incidents and the proportion of total flights over time. Naturally, it is expected that the probability of accidents will increase with an increase in the proportion of total flights.

To this end, the accidents and injuries of nine major USA airlines were studied in these references. If all the airlines have equally safe performance in a specific period, the injury incidents can be explained by the IC model considering the number of flights of each airline as a percentage of the total number of the airlines as an explanatory variable and the injury incidents as a response variable. Following Eq. (2), the IC Poisson model is established by the relationship between the explanatory and response variables:

$$\begin{gathered} {\varvec{\beta}}_{{\varvec{0}}} = \left[ {0.8945\,\,8.5018} \right]{,} \hfill \\ \tilde{\varvec{X}^{\prime}} = \left( {\begin{array}{*{20}c} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ {{0}{\text{.0503}}} & {{0}{\text{.054}}} & {{0}{\text{.0629}}} & {{0}{\text{.075}}} & {{0}{\text{.095}}} & {{0}{\text{.1292}}} & {{0}{\text{.1382}}} & {{0}{\text{.1920}}} & {{0}{\text{.2078}}} \\ \end{array} } \right){,} \hfill \\ n = 9{,} \, p = 2. \hfill \\ \end{gathered}$$

(21)

To reach the ARL₀ equal to 200, the values of UCL_MEWMA, UCL_LRT and CV are obtained as 1.303, 10.53 and 6.61, respectively. As a common approach in Phase II applications, the intercept changes to 0.965, as an artificial shift, to reach an OOC signal. Table 8 gathers the details of 11 OOC generated profiles. It provides the details of estimated parameters (first part), normalized parameters (second part), EWMA statistic of the normalized parameters (third part), MEWMA and LRT statistics (fourth part) and the ratio of samples (fifth part), respectively. Note that the input vector of ESVR in this example is defined with length 11 (p + 9); for example, the input in the first sample (j = 1) is [0.31 − 2.17 0.2 1 0 0 0.23 0 1 0 6.98].

Table 8 The OOC profiles characteristics of the illustrative example

Full size table

The outputs of ESVR for the above inputs are depicted in Fig. 8. The MEWMA and LRT statistics are also added to this figure to visualised their trends as compared to the ESVR. It can be observed from Fig. 8 that ESVR triggered an OOC signal at the 11th sample while LRT and MEWMA procedures were not able to detect this shift.

To identify the shifted parameter, EWMA_P = [0.1 − 0.59 1.15] is incorporated into the SVRD₁ and SVRD₂ and the outputs are 1.16 and − 0.27, respectively. This is an indication of shifts in the first parameter because CVD₁ and CVD₂ are 0.39 and 0.42 (i.e., 1.16 > 0.39 and − 0.27 < 0.42).

From this example, it is observed that the O_j exceeds its control limit sooner than MEWMA and LRT schemes and the results under the existing case study are in accordance with the simulation results. This excursion suggests that the proposed ESVR has excellent abilities in practical applications of Phase II SPC problems in comparison to other competitors. Also, these findings act as evidence that the proposed diagnosis approach encompass a significant impact on the detection of shifted parameters. Therefore, the ESVR control chart is found to be more efficient in Poisson profile monitoring.

8 Conclusions

A novel use of SVR as a control chart is extended to monitor Poisson profiles in Phase II. This method, equipped with new input features and evolutionary training procedure based on the PSO algorithm, is able to quickly detect the OOC situations due to the advantage of an evolutionary training framework either in parametric or non-parametric monitoring where the OOC model can be unknown. To design a more efficient method for identification of small and moderate shifts, the proposed scheme is incorporated with additional run-rules. Finally, a diagnostic procedure is used with SVR structures and ANN. Both the SVR and ANN approaches showed a better detection ability. The contributions of this study include: firstly, implementation of SVR as a control chart in monitoring Poison profiles. Secondly, utilizing a novel input feature corresponding to the ratio of MEWMA and LRT statistics. Lastly, the training of the SVR using evolutionary PSO algorithm.

That said, owing to the requirements in evolutionary computations for training of ESVR, the proposed approach requires more computations than the statistical approaches such as MEWMA and LRT. Note though, this challenge commonly occurs in the machine learning applications but due to rapid extension of artificial intelligence related technology from the software and hardware aspects, the importance of this challenge has decreased in recent years.

For future research purpose, the investigation of other evolutionary algorithms with different IC profile types and sample sizes would be preferable. Moreover, the proposed method can conveniently and effectively tackle the problem of other non-parametric profile monitoring; for example, applying the proposed approach with linear IC model can be a good idea for future research.

Availability of data and material

Not applicable.

Code availability

Not applicable.

References

Abbas T, Qian Z, Ahmad S, Riaz M (2016) On monitoring of linear profiles using Bayesian methods. Comput Ind Eng 94:245–268
Article Google Scholar
Abbasi SA, Yeganeh A, Shongwe SC (2022) Monitoring non-parametric profiles using adaptive EWMA control chart. Sci Rep 12(1):14336
Article ADS CAS PubMed PubMed Central Google Scholar
Adnan RM, Mostafa RR, Islam ARMT, Kisi O, Kuriqi A, Heddam S (2021a) Estimating reference evapotranspiration using hybrid adaptive fuzzy inferencing coupled with heuristic algorithms. Comput Electron Agric 191:106541
Article Google Scholar
Adnan RM, Mostafa RR, Kisi O, Yaseen ZM, Shahid S, Zounemat-Kermani M (2021b) Improving streamflow prediction using a new hybrid ELM model combined with hybrid particle swarm optimization and grey wolf optimization. Knowl-Based Syst 230:107379
Article Google Scholar
Adnan RM, Kisi O, Mostafa RR, Ahmed AN, El-Shafie A (2022) The potential of a novel support vector machine trained with modified mayfly optimization algorithm for streamflow prediction. Hydrol Sci J 67(2):161–174
Article Google Scholar
Alevizakos V, Koukouvinos C, Lappa A (2019a) Comparative study of the Cp and Spmk indices for logistic regression profile using different link functions. Qual Eng 31(3):453–462
Article Google Scholar
Alevizakos V, Koukouvinos C, Castagliola P (2019b) Process capability index for Poisson regression profile based on the Spmk index. Qual Eng 31(3):430–438
Article Google Scholar
Al-Zoubi AM, Faris H, Alqatawna J, Hassonah MA (2018) Evolving support vector machines using whale optimization Algorithm for spam profiles detection on online social networks in different lingual contexts. Knowl-Based Syst 153:91–104. https://doi.org/10.1016/j.knosys.2018.04.025
Article Google Scholar
Al-Zoubi AM, Hassonah MA, Heidari AA, Faris H, Mafarja M, Aljarah I (2021) Evolutionary competitive swarm exploring optimal support vector machines and feature weighting. Soft Comput 25(4):3335–3352
Article Google Scholar
Amiri A, Koosha M, Azhdari A, Wang G (2015) Phase I monitoring of generalized linear model-based regression profiles. J Statist Comput Simul 85(14):2839–2859
Article MathSciNet Google Scholar
Arana-Daniel N, Gallegos AA, López-Franco C, Alanís AY, Morales J, López-Franco A (2016) Support vector machines trained with evolutionary algorithms employing kernel adatron for large scale classification of protein structures. Evol Bioinform Online 12:285–302
Article CAS PubMed PubMed Central Google Scholar
Aslam M, Bantan RAR, Khan N (2020) Design of NEWMA np control chart for monitoring neutrosophic nonconforming items. Soft Comput 24(21):16617–16626
Article Google Scholar
Chatterjee S, Hadi AS (2013) Regression analysis by example, 5th edn. Wiley, Hoboken
Google Scholar
Chen S, Yu J, Wang S (2020) Monitoring of complex profiles based on deep stacked denoising autoencoders. Comput Ind Eng 143:106402
Article Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Article Google Scholar
Cuentas S, García E, Peñabaena-Niebles R (2022) An SVM-GA based monitoring system for pattern recognition of autocorrelated processes. Soft Comput 26(11):5159–5178
Article Google Scholar
Dantas Dias ML, Rocha Neto AR (2017) Training soft margin support vector machines by simulated annealing: a dual approach. Expert Syst Appl 87:157–169
Article Google Scholar
Fallahdizcheh A, Wang C (2022) Profile monitoring based on transfer learning of multiple profiles with incomplete samples. IISE Trans 54(7):643–658
Google Scholar
Gardner MM et al (1997) Equipment fault detection using spatial signatures. IEEE Trans Compon Packag Manuf Technol Part C 20(4):295–304
Article MathSciNet Google Scholar
Gupta S, Montgomery DC, Woodall WH (2006) Performance evaluation of two methods for online monitoring of linear calibration profiles. Int J Production Res 44(10):1927–1942
Article Google Scholar
Haq A (2020) Adaptive MEWMA charts for univariate and multivariate simple linear profiles. Commun Statist Theory Methods 51(16):5383–5411. https://doi.org/10.1080/03610926.2020.1839100
Article MathSciNet Google Scholar
He S, Song L, Shang Y, Wang Z (2020) Change-point detection in phase I for autocorrelated Poisson profiles with random or unbalanced designs. Int J Prod Res 59(14):4306–4323. https://doi.org/10.1080/00207543.2020.1762017
Article Google Scholar
Hosseinifard SZ, Abdollahian M, Zeephongsekul P (2011) Application of artificial neural networks in linear profile monitoring. Expert Syst Appl 38(5):4920–4928
Article Google Scholar
Huwang L, Wang Y-HT, Xue S, Zou C (2014) Monitoring general linear profiles using simultaneous confidence sets schemes. Comput Ind Eng 68:1–12
Article Google Scholar
Huwang L, Wang Y-HT, Yeh AB, Huang Y-H (2016) Phase II profile monitoring based on proportional odds models. Comput Ind Eng 98:543–553
Article Google Scholar
Ikram RMA, Ewees AA, Parmar KS, Yaseen ZM, Shahid S, Kisi O (2022a) The viability of extended marine predators algorithm-based artificial neural networks for streamflow prediction. Appl Soft Comput 131:109739
Article Google Scholar
Ikram RMA, Dai H-L, Ewees AA, Shiri J, Kisi O, Zounemat-Kermani M (2022b) Application of improved version of multi verse optimizer algorithm for modeling solar radiation. Energy Rep 8:12063–12080
Article Google Scholar
Johnson RA, Wichern DW (2007) Applied Multivariate Statistical Analysis. Pearson Prentice Hall, Hoboken
Google Scholar
Jones CL, Abdel-Salam A-SG, Mays DA (2020) Practitioners guide on parametric, nonparametric, and semiparametric profile monitoring. Qual Reliab Eng Int 37(3):857–881. https://doi.org/10.1002/qre.2770
Article Google Scholar
Kadkhodazadeh M, Farzin S (2022) A novel hybrid framework based on the ANFIS, discrete wavelet transform, and optimization algorithm for the estimation of water quality parameters. J Water Clim Change 13(8):2940–2961
Article Google Scholar
Kang L, Albin SL (2000) On-line monitoring when the process yields a linear profile. J Qual Technol 32(4):418–426
Article Google Scholar
Kennedy J (2010) Particle swarm optimization. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Boston, pp 760–766
Google Scholar
Kim K-J, Cho S-B (2008) Evolutionary ensemble of diverse artificial neural networks using speciation. Neurocomputing 71(7):1604–1618
Article Google Scholar
Li C-I, Pan J-N, Liao C-H (2019) Monitoring nonlinear profile data using support vector regression method. Qual Reliab Eng Int 35(1):127–135
Article Google Scholar
Maleki MR, Amiri A, Castagliola P (2018) An overview on recent profile monitoring papers (2008–2018) based on conceptual classification scheme. Comput Ind Eng 126:705–728
Article Google Scholar
Maleki MR, Castagliola P, Amiri A, Khoo MBC (2019) The effect of parameter estimation on phase II monitoring of poisson regression profiles. Commun Statist Simul Comput 48(7):1964–1978
Article MathSciNet Google Scholar
Mohammadzadeh M, Yeganeh A, Shadman A (2021) Monitoring logistic profiles using variable sample interval approach. Comput Ind Eng 158:107438
Article Google Scholar
Montgomery DC (2019) Introduction to statistical quality control, 8th edn. Wiley, New York
Google Scholar
Motasemi A, Alaeddini A, Zou C (2017) An area-based methodology for the monitoring of general linear profiles. Qual Reliab Eng Int 33(1):159–181
Article Google Scholar
Nassar SH, Abdel-Salam A-SG (2021) Semiparametric MEWMA for Phase II profile monitoring. Qual Reliab Eng Int 37(5):1832–1846
Article Google Scholar
Ojha VK, Abraham A, Snášel V (2017) Metaheuristic design of feedforward neural networks: a review of two decades of research. Eng Appl Artif Intell 60:97–116
Article Google Scholar
Pacella M, Semeraro Q (2011) Monitoring roundness profiles based on an unsupervised neural network algorithm. Comput Ind Eng 60(4):677–689
Article Google Scholar
Pourpanah F, Lim CP, Saleh JM (2016) A hybrid model of fuzzy ARTMAP and genetic algorithm for data classification and rule extraction. Expert Syst Appl 49:74–85
Article Google Scholar
Qi D, Wang Z, Zi X, Li Z (2016) Phase II monitoring of generalized linear profiles using weighted likelihood ratio charts. Comput Ind Eng 94:178–187
Article Google Scholar
Qi D, Li Z, Zi X, Wang Z (2017) Weighted likelihood ratio chart for statistical monitoring of queueing systems. Qual Technol Quant Manag 14(1):19–30
Article Google Scholar
Riaz M, Abbasi SA, Ahmad S, Zaman B (2014) On efficient phase II process monitoring charts. Int J Adv Manuf Technol 70(9):2263–2274
Article Google Scholar
Shadman A, Zou C, Mahlooji H, Yeh AB (2017) A change point method for phase II monitoring of generalized linear profiles. Commun Statist Simul Comput 46(1):559–578
Article MathSciNet Google Scholar
Shang Y, Tsung F, Zou C (2011) Profile monitoring with binary data and random predictors. J Qual Technol 43(3):196–208
Article Google Scholar
Shang Y, Wang Z, Zhang Y (2018) Nonparametric control schemes for profiles with attribute data. Comput Ind Eng 125:87–97
Article Google Scholar
Song L, He S, Zhou P, Shang Y (2021) Empirical likelihood ratio charts for profiles with attribute data and random predictors in the presence of within‐profile correlation. Qual Reliab Eng Int 38(1):153–173. https://doi.org/10.1002/qre.2965
Article Google Scholar
Steiner S, Jensen WA, Grimshaw SD, Espen B (2016) Nonlinear profile monitoring for oven-temperature data. J Qual Technol 48(1):84–97
Article Google Scholar
Stoean C, Stoean R (2014) Support vector machines and evolutionary algorithms for classification: Single or together? Springer International Publishing, Cham
Book Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley-Interscience, New York
Google Scholar
Wang F-K, Du T (2014) Implementing support vector regression with differential evolution to forecast motherboard shipments. Expert Syst Appl 41(8):3850–3855
Article Google Scholar
Wang Y, Li J, Ma Y, Song L, Wang Z (2022) Nonparametric monitoring schemes in Phase II for ordinal profiles with application to customer satisfaction monitoring. Comput Ind Eng 165:107931
Article Google Scholar
Williams JD, Woodall WH, Birch JB (2007) Statistical monitoring of nonlinear product and process quality profiles. Qual Reliab Eng Int 23(8):925–941
Article Google Scholar
Woodall WH (2007) Current research on profile monitoring. Production 17(3):420–425
Article Google Scholar
Xu L, Wang S, Peng Y, Morgan JP, Reynolds MR, Woodall WH (2012) The monitoring of linear profiles with a GLR control chart. J Qual Technol 44(4):348–362
Article Google Scholar
Yeganeh A, Shadman A (2020) Monitoring linear profiles using Artificial Neural Networks with run rules. Expert Syst Appl 168:114237
Article Google Scholar
Yeganeh A, Shadman A (2021) Using evolutionary artificial neural networks in monitoring binary and polytomous logistic profiles. J Manuf Syst 61:546–561. https://doi.org/10.1016/j.jmsy.2021.10.007
Article Google Scholar
Yeganeh A, Shadman A, Amiri A (2021) A novel run rules based MEWMA scheme for monitoring general linear profiles. Comput Ind Eng 152:107031
Article Google Scholar
Yeganeh A, Shadman A, Abbasi SA (2022a) Enhancing the detection ability of control charts in profile monitoring by adding RBF ensemble model. Neural Comput Appl 34(12):9733–9757
Google Scholar
Yeganeh A, Abbasi SA, Pourpanah F, Shadman A, Johannssen A, Chukhrova N (2022b) An ensemble neural network framework for improving the detection ability of a base control chart in non-parametric profile monitoring. Expert Syst Appl 204:117572
Article Google Scholar
Yeganeh A, Johannssen A, Chukhrova N, Abbasi SA, Pourpanah F (2023) Employing machine learning techniques in monitoring autocorrelated profiles. Neural Comput Appl 35(22):16321–16340. https://doi.org/10.1007/s00521-023-08483-3
Article Google Scholar
Yeh AB, Huwang L, Li Y-M (2009) Profile monitoring for a binary response. IIE Trans 41(11):931–941
Article Google Scholar
Zhang F, Deb C, Lee SE, Yang J, Shah KW (2016) Time series forecasting for building energy consumption using weighted support vector regression with differential evolution optimization technique. Energy Build 126:94–103
Article CAS Google Scholar
Zhao C, Du S, Deng Y, Li G, Huang D (2020) Circular and cylindrical profile monitoring considering spatial correlations. J Manuf Syst 54:35–49
Article Google Scholar
Zhou Q, Zou C, Wang Z, Jiang W (2012) Likelihood-based EWMA charts for monitoring Poisson count data with time-varying sample sizes. J Am Statist Assoc 107(499):1049–1062
Article MathSciNet CAS Google Scholar
Zhou P, Liu P, Wang S, Zhang C, Zhang J, Li S (2022) Functional state-space model for multi-channel autoregressive profiles with application in advanced manufacturing. J Manuf Syst 64:356–371
Article Google Scholar
Ziani R, Felkaoui A, Zegadi R (2017) Bearing fault diagnosis using multiclass support vector machines with binary particle swarm optimization and regularized Fisher’s criterion. J Intell Manuf 28(2):405–417
Article Google Scholar
Zou C, Tsung F, Wang Z (2007) Monitoring general linear profiles using multivariate exponentially weighted moving average schemes. Technometrics 49(4):395–408
Article MathSciNet Google Scholar
Zou C, Tsung F, Wang Z (2008) Monitoring profiles based on nonparametric regression methods. Technometrics 50(4):512–526
Article MathSciNet Google Scholar

Download references

Funding

Open Access funding provided by the Qatar National Library.

Author information

Authors and Affiliations

Department of Mathematical Statistics and Actuarial Science, Faculty of Natural and Agricultural Sciences, University of the Free State, Bloemfontein, 9301, South Africa
Ali Yeganeh & Sandile Charles Shongwe
Statistics Program, Department of Mathematics, Statistics and Physics, College of Arts and Sciences, Qatar University, 2713, Doha, Qatar
Saddam Akber Abbasi
Statistical Consulting Unit, College of Arts and Sciences, Qatar University, 2713, Doha, Qatar
Saddam Akber Abbasi
Department of Statistics, Faculty of Natural and Agricultural Sciences, University of Pretoria, Hatfield, 0002, South Africa
Jean-Claude Malela-Majika
Department of Industrial Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, P.O. Box 91775-1111, Mashhad, Iran
Ali Reza Shadman

Authors

Ali Yeganeh
View author publications
You can also search for this author in PubMed Google Scholar
Saddam Akber Abbasi
View author publications
You can also search for this author in PubMed Google Scholar
Sandile Charles Shongwe
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Claude Malela-Majika
View author publications
You can also search for this author in PubMed Google Scholar
Ali Reza Shadman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AY: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing—Original Draft, Visualization. SAA: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Resources, Writing—Review & Editing, Supervision, Project administration. SCS: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Resources, Writing—Review & Editing, Supervision, Project administration. J-CM-M: Formal analysis, Investigation, Resources, Writing—Review & Editing, Supervision, Project administration. ARS: Formal analysis, Investigation, Resources, Writing—Review & Editing, Supervision, Project administration.

Corresponding authors

Correspondence to Saddam Akber Abbasi or Sandile Charles Shongwe.

Ethics declarations

Conflict of interest

I certify that no party having a direct interest in the results of the research supporting this article has or will confer a benefit on me or on any organization with which I am associated and, I certify that all financial and material supports for this research and work are clearly identified in the title page of the manuscript.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yeganeh, A., Abbasi, S.A., Shongwe, S.C. et al. Evolutionary support vector regression for monitoring Poisson profiles. Soft Comput 28, 4873–4897 (2024). https://doi.org/10.1007/s00500-023-09047-2

Download citation

Accepted: 21 July 2023
Published: 20 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00500-023-09047-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Evolutionary support vector regression for monitoring Poisson profiles

Abstract

Similar content being viewed by others

A multi-strategy improved particle swarm optimization algorithm and its application to identifying uncorrelated multi-source load in the frequency domain

Comparative Analysis of Intelligently Tuned Support Vector Regression Models for Short Term Load Forecasting in Smart Grid Framework

Optimizing Fault Detection for Big Data Analytics Through Evolutionary Computation

1 Introduction

2 Preliminaries

2.1 Phase II Poisson profile monitoring

2.2 Existing control charts

2.3 SVR formulation

2.4 Evolutionary SVR

2.5 PSO algorithm

3 Training of the proposed method

4 Performance comparisons

4.1 ARL1 values for the fixed design points condition

4.2 ARL1 values for the random design points condition

4.3 ARL1 values for the non-parametric condition

5 Sensitivity analysis

5.1 ARL1 comparisons under different machine learning techniques

5.2 ARL1 comparisons under an ARL0 value of 200

5.3 ARL1 Comparisons under different n values

5.4 Increasing the detection ability with run-rules

5.5 Effect of EA in training of ESVR

5.6 Effect of input features

6 Diagnosis aid

6.1 Requirements for profile diagnosis actions

6.2 The proposed SVR structure in profile diagnosis

6.3 The accuracy of the proposed SVRS structure in profile diagnosis

7 Illustrative example

8 Conclusions

Availability of data and material

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

4.1 ARL₁ values for the fixed design points condition

4.2 ARL₁ values for the random design points condition

4.3 ARL₁ values for the non-parametric condition

5.1 ARL₁ comparisons under different machine learning techniques

5.2 ARL₁ comparisons under an ARL₀ value of 200

5.3 ARL₁ Comparisons under different n values