1 Introduction

Statistical process monitoring (SPM) is usually employed for industrial processes to omit assignable causes that deteriorate the product outcome. SPM is the major field for controlling process variations to eventuate lower costs in waste, scrap, rework and claims, better quality, and more insights into the capability of the process. Seven main tools, entailing scatter diagrams, Pareto charts, control charts, histograms, cause-and-effect diagrams, check sheets and stratification, are utilized in SPM to implement inspection and monitoring procedures [1]. Among them, control charts are the most successful and effective tools for quality control of manufacturing processes [2,3,4].

To employ a control chart for process monitoring, two phases entailing Phase I and II should be initially defined. In Phase I, it is tried to achieve proper estimations of the process parameters, whereas Phase II monitoring aims to find assignable causes in which the process situation changes from In-Control (IC) to Out-of-Control (OC) state [5,6,7,8,9]. Average Run Length (ARL) and Standard Deviation of Run Length (SDRL) are two common performance indicators in Phase II. The ARL is the average number of samples to be obtained by the predefined control chart before the chart triggers an OC signal. Thus, a control chart with larger (smaller) values of the ARL is to be preferred when the underlying process is in IC (OC) state [the more common notation is ARL0 (ARL1)] [10,11,12]. In addition, the SDRL is defined in a similar way as a secondary criterion in Phase II (i.e., SDRL0 and SDRL1).

There are two common approaches to monitor a manufacturing processes with the help of control charts namely monitoring quality characteristics and profile monitoring [13]. In this paper, we focus on profile monitoring. Here, the quality of a process or product is modelled via a functional relationship between a response (dependent) variable and one or more explanatory (independent) variable(s) [14]. The aim of profile monitoring is to check the stability of a predefined IC relationship (or profile) over time, and it is essential to reach a true OC signal as soon as possible when the IC model shifts to an unknown OC profile [15].

Different IC models can be employed due to the nature of the underlying problem, such as circular [16], linear [5, 17,18,19,20,21,22], logistic [23,24,25], nonlinear [26], nonparametric [27, 28], multichannel [29], polynomial [30] or quadratic [31]. Among them, linear profiles have received more attention in the literature [32, 33]. The majority of previous studies in linear profile monitoring is based on the independency assumption regarding within or between profiles in relation to the error terms. However, this assumption is often violated in manufacturing practice that is characterized by autocorrelated profiles and consequently, conventional approaches may lead to inaccurate outcomes for this type of profiles.

Autocorrelated profiles consist of within- and between-correlation models in the related literature [34,35,36,37]. In the first group, Soleimani, Noorossana and Amiri [38] developed four control charts including T2 and three well-known Exponentially Weighted Moving Average (EWMA) charts considering the first order Autoregressive (AR) model, i.e., AR(1). The results showed the superiority of EWMA-based approaches over T2. Autoregressive Moving Average (ARMA), Vector ARMA (VARMA) etc. are other more complex models that have been developed in this field [39,40,41,42,43]. Due to the higher potential for applications of the second group, this paper focuses on between-profile autocorrelation. To the best of the authors' knowledge, the pioneering work is Noorossana, Amiri and Soleimani [44], in which they developed T2, EWMA of residuals (EWMA/R) and transformed individual EWMA (EWMA-3) control charts for situations where autocorrelation effects exist between profiles. Similar to Soleimani, Noorossana and Amiri [38], they concluded that EWMA-based methods outperform T2. Wang and Lai [45] aggregated the individual EWMA statistics to a Multivariate EWMA (MEWMA) control chart for profiles with between-autocorrelation, and it has been shown that MEWMA outperforms T2. Khedmati and Niaki [46] considered both linear and polynomial profiles; they first utilized the U statistic for removing the effect of autocorrelation and then developed a T2-based control chart. The experimental results showed that this method performs better than conventional T2 control charts, but comparisons with EWMA are missing. Koosha and Amiri [47] proposed a similar T2-based control chart for monitoring autocorrelated logistic profiles. Wang and Huang [48] modified the estimation procedure of the EWMA approach, and the simulation results demonstrated that this scheme has a faster detection ability than that of conventional EWMA.

From the literature, it can be inferred that the probability of occurring autocorrelations in practical applications is very high. Therefore, an early detection of OC conditions is more important than simulation results, as a delay in detection may result in the production of nonconformities and additional costs. However, the conventional control charts such as T2, EWMA/R and EWMA-3 are not able to perform well in line with this aim as their performance deteriorate in the occurrence of autocorrelation in comparison with simple situations; for example, it can be referred to the results of ARL1 in Noorossana, Amiri and Soleimani [44] versus Kim et al. [20].

Hence, proposing a novel control chart with a tangible ability in reduction of the OC signaling time in autocorrelated profiles is crucial. To remedy this challenge, in recent years, several studies incorporated machine learning techniques in the SPM context in monitoring roundness [49], nonlinear [50,51,52], linear [53, 54] and logistic [25, 55, 56] profiles. As a different approach, Chen et al. [57] employed a deep learning technique, called stacked denoising autoencoders, to monitor autocorrelated profiles. Specifically, this scheme extracts a number of features from the process using autoencoders, and then the extracted features are used to develop control charts based on T2 and EWMA. In other words, the main task of their approach is to select proper features from the process, whereas the direct usage of machine learning techniques as a control chart would be more promising. As far as the authors know, there are no further articles where machine learning techniques are employed in monitoring autocorrelated profiles.

The aim of this paper is to develop a robust control chart based on machine learning techniques to alleviate the above-mentioned challenges, i.e., reducing the values of the ARL1 and SDRL1 for autocorrelated linear profiles that can result in early detection of OC situations in Phase II. To achieve this, three combinations of input features based on the effect of the mean of responses, the mean of errors, and T2 statistic, each in the current and previous sample, are defined to fed into the machine learning techniques for monitoring profiles with between-autocorrelation of first order, i.e., AR(1). Since each machine learning technique performs differently in tackling various problems, six machine learning techniques ranging from shallow to deep structures including adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN) with Back-Propagation (BP) training, Convolutional Neural Network (CNN), long short-term memory (LSTM) network, Radial Basis Function (RBF) network and support vector regression (SVR), are employed to find the most appropriate one. To sum up, the main contributions of this paper are as follows:

  • Improving the detection ability of Phase II control charts for monitoring linear autocorrelated profiles with the help of machine learning techniques,

  • Defining different combinations of input features based on the effect of the mean of responses, the mean of errors, and T2 statistic, each in the current and previous sample, for monitoring the between-autocorrelation effect of profiles,

  • Evaluating the performance of the defined input features and finding the best combination using the proposed machine learning-based control chart, and

  • Identification of the most appropriate machine learning technique under the most suitable input combination for this problem.

The rest of this article is organized as follows. In Sect. 2, definitions of autocorrelated linear profiles are discussed. Section 3 presents the framework of the proposed approach. Results of simulation studies regarding performance comparisons are given and discussed in Sect. 4. To show the effectiveness of our method, an illustrative example is given in Sect. 5. Finally, Sect. 6 gives some conclusions and suggests future research directions.

2 Preliminaries

In this section, first, the general relations of linear profiles with between-autocorrelation error terms are presented. Then, three common control charts, namely T2, EWMA/R and EWMA-3, are briefly introduced. Finally, details about the Ordinary Least Squares (OLS) estimation of the parameters are discussed. These basics are necessary against the backdrop that (1) our proposed method employs the T2 statistic as input feature, and (2) these conventional charts are used for comparison purposes in our analyses.

2.1 The linear autocorrelated profile in Phase II

A common linear profile, which is the simplest but the most fundamental type of profiles [19], is defined as:

$$\begin{aligned} &Y_{ij} = A_{0} + A_{1} X_{i} + \varepsilon_{ij} ,\varepsilon_{ij} \sim N\left( {0,\sigma^{2} } \right),\\ & i = 1,2, \ldots ,n;\,j = 1,2, \ldots , \end{aligned}$$
(1)

where Xi represent the explanatory variable in a linear profile and the response variable Yij is the quality characteristic under study. The parameters of the above IC model (intercept A0, slope A1 and error variance σ2) are estimated from Phase I samples, and it is usually assumed that sample size n and independent variable Xi are fixed in each profile. When there is an AR(1) structure between the random error terms, (1) becomes:

$$ \begin{aligned}&Y_{ij} = A_{0} + A_{1} X_{i} + \varepsilon_{ij} ,\varepsilon_{ij} = \phi \varepsilon_{{i\left( {j - 1} \right)}} + a_{ij} ,a_{ij} \sim N\left( {0,\sigma^{2} } \right),\\&i = 1,2,...,n;\quad j = 1,2, \ldots , \end{aligned}$$
(2)

where ϕ is a constant autocorrelation coefficient, which is assumed to be known in Phase II. To monitor the above IC profile, we briefly present three common approaches in the following subsections.

2.2 The T 2 control chart for monitoring autocorrelated profiles in Phase II

By some calculations, it can be easily shown that the estimated responses are obtained in the jth generated sample over time as follows [44]:

$$ \hat{y}_{ij} = \phi y_{{i\left( {j - 1} \right)}} + \left( {1 - \phi } \right)(A_{0} + A_{1} X_{i} ). $$
(3)

Thus, the empirical residuals can be written as:

$$ e_{ij} = \hat{y}_{ij} - y_{ij} . $$
(4)

Noorossana, Amiri and Soleimani [44] used a modified form of the T2 statistics proposed by Kang and Albin [14] in a simple linear profile:

$$ t_{j}^{2} = \underline{e}_{j} \sum_{e}^{ - 1} \underline{e}_{j}^{^{\prime}} , \underline{e}_{j} = \left( {e_{1j} ,e_{2j} , \ldots ,e_{nj} } \right), $$
(5)

where ∑e is the symmetric n × n matrix σ2I. Since the chart statistic \((t_{j}^{2}\)) is ensured to be larger than zero, it is compared with a predefined Upper Control Limit (UCLT) to reach an OC signal. Note that the Lower Control Limit (LCLT) is equal to 0.

2.3 The EWMA/R control chart for monitoring autocorrelated profiles in Phase II

In the EWMA/R control chart, two simultaneous statistics monitor the generated profiles. The first statistic is related to the mean of the residuals and is defined as follows [44, 53]:

$$ z_{j} = \theta \overline{e}_{j} + (1 - \theta )z_{(j - 1)} ,\overline{e}_{j} = \frac{1}{n}\sum\limits_{i = 1}^{n} {e_{ij} } , $$
(6)

In (6), it holds z0 = 0, and θ is the EWMA constant that usually has a value between 0.1 and 0.9 [20]. Following previous works [21, 53, 54, 58], θ is set to 0.2 in this paper.

The second statistic of the EWMA/R chart is the range of the empirical residuals defined by [37, 48]:

$$ r_{j} = \max (e_{ij} ) - \min (e_{ij} ). $$
(7)

The EWMA/R declares the process as IC if both of the following conditions are met [14]:

$$ \begin{aligned}- L\sigma \sqrt {\frac{\theta }{{\left( {2 - \theta } \right)n}}} < z_{j} < L\sigma \sqrt {\frac{\theta }{{\left( {2 - \theta } \right)n}}} ,\\ \sigma \left( {d_{2} - Ld_{3} } \right) < r_{j} < \sigma \left( {d_{2} + Ld_{3} } \right)\end{aligned} $$
(8)

In (8), the value of L is assigned to reach a predefined ARL0, while d2 and d3 are two constants that depend on the sample size (see Montgomery [1] for more details).

2.4 The EWMA-3 control chart for monitoring autocorrelated profiles in Phase II

To solve the problem of dependency between the estimators in linear profiles, Kim, Mahmoud and Woodall [20] suggested to have zero mean explanatory variables, where the least squares estimators of slope and intercept are independent random variables. By this coding (transformation), the EWMA-3 approach has the IC model

$$ \begin{aligned}Y_{ij} = B_{0} + B_{1} X_{di} + \varepsilon_{ij} ,\varepsilon_{ij} = \phi \varepsilon_{{i\left( {j - 1} \right)}} + a_{ij} ,\varepsilon_{ij} \sim N\left( {0,\sigma^{2} } \right),\\i = 1,2, \ldots ,n;\quad j = 1,2, \ldots ,\end{aligned} $$
(9)

where the coded explanatory variables (\({X}_{di}={X}_{i}-\overline{X }\)) lead to the transformed IC intercept B0 = A0 + A1 \(\overline{X }\), \({A}_{0}\) and \({A}_{1}\) are defined as in (1). Note that the transformed IC slope is B1 = A1 in this approach. The OLS estimation of the parameters (\({\widehat{B}}_{0j},{\widehat{B}}_{1j},{{\widehat{\sigma }}^{2}}_{j}\)) generates three separate EWMA-based errors for intercept (\({e}_{Ij}\)), slope (\({e}_{Sj}\)) and standard deviation (\({e}_{ij}\)), as follows:

$$ \begin{aligned} e_{Ij} & = \hat{B}_{0j} + \emptyset \hat{B}_{{0\left( {j - 1} \right)}} - \left( {1 - \emptyset } \right)B_{0} , \\ e_{Sj} & = \hat{B}_{1j} + \emptyset \hat{B}_{{1\left( {j - 1} \right)}} - \left( {1 - \emptyset } \right)B_{1} , \\ e_{ij} & = y_{ij} - \emptyset y_{{i\left( {j - 1} \right)}} - \left( {1 - \emptyset } \right)B_{0} + B_{1} X_{di} , \\ {\text{MSE}}_{j} & = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} e_{ij}^{2} . \\ \end{aligned} $$
(10)

Based on the OLS estimation, the Mean Square Error (MSE) of the jth profile is considered as the estimator of the error variance (for details see Kim et al. [20], Huwang et al. [59] and Yeganeh and Shadman [54]), thus, three chart statistics can be calculated as follows:

$$ \begin{aligned} {\text{EWMA}}_{Ij} & = \theta e_{Ij} + \left( {1 - \theta } \right){\text{EWMA}}_{{I\left( {j - 1} \right)}} , \\ {\text{EWMA}}_{Sj} & = \theta e_{Sj} + \left( {1 - \theta } \right){\text{EWMA}}_{{S\left( {j - 1} \right)}} , \\ {\text{EWMA}}_{Ej} & = \max \left[ {\theta \left( {{\text{MSE}}_{j} - 1} \right) + \left( {1 - \theta } \right){\text{EWMA}}_{{E\left( {j - 1} \right)}} ,0} \right]. \\ \end{aligned} $$
(11)

The control limits of the three separate control charts are designed as:

$$ \begin{aligned} {\text{UCL}}_{I} & = - {\text{LCL}}_{I} = L_{I} \sigma \sqrt {\frac{\theta }{{n\left( {2 - \theta } \right)}}} \\ {\text{UCL}}_{S} & = - {\text{LCL}}_{S} = L_{S} \sigma \sqrt {\frac{\theta }{{n\left( {2 - \theta } \right)}}} \\ {\text{UCL}}_{E} & = L_{E} \sqrt {\frac{{\theta {\text{Var}}\left( {{\text{MSE}}_{j} } \right)}}{{\left( {2 - \theta } \right)}}} \\ \end{aligned} $$
(12)

It is worth noting that \({\mathrm{LCL}}_{E}=0\), and some suggestions regarding \(\mathrm{Var}({\mathrm{MSE}}_{j})\) can be found in Kim, Mahmoud and Woodall [20], Noorossana et al. [44] and Hosseinifard et al. [53]. EWMA-3 triggers an OC signal when at least one of the statistics exceed the control limits. The proposed constants entailing LI, LS and LE are usually adjusted to reach a desired value of ARL0 in such a way that each of the separate charts achieves an identical individual ARL0.

2.5 OLS estimation of the parameters

Since the parameters estimated via OLS are employed as inputs of the machine learning techniques, some details on the OLS estimators are given below [20, 21, 53]. While the intercept of the original and the transformed model is estimated by \({\widehat{A}}_{0j}={\overline{Y}}_{j}-{\widehat{\text{A}}}_{1j}{\overline{X}}\) and \({\widehat{B}}_{0j}= {\overline{Y}}_{j}\), respectively, the slope parameter in both models is estimated via \({\widehat{A}}_{1j} = {\widehat{B}}_{1j} = \frac{ S_{{XY_{j} }} } { S_{XX} }\), where \({S}_{XY_{j}}=\sum\nolimits_{i=1}^{n}Y_{ij}(X_{i}-{\overline{X}})\) and \({S}_{XY}=\sum\nolimits_{i=1}^{n}(X_{i}-{\overline{X}})^{2}\) (with \({\overline{Y}}_{j}=\frac{1}{n}\sum\nolimits_{i=1}^{n}Y_{ij}\) and \({\overline{X}}_{j}=\frac{1}{n}\sum\nolimits_{i=1}^{n}X_{ij}\)). Note that the definition of MSEj based on (10) is equivalent to the estimator of the error variance.

3 The proposed control chart for monitoring autocorrelated profiles

The basic idea in this paper is to use machine learning techniques instead of statistical control charts for monitoring profiles with between-autocorrelation. For this purpose, several features are extracted from the process to embed into the machine learning techniques. Using training patterns and the obtained control limits help to improve decision-making about the process.

To employ a machine learning technique as a control chart, four main steps are defined in the following. In the first step, the structure of the input features and outputs of a machine learning technique are determined, and then, a training data set based on the input features and outputs is generated by simulating IC and OC profiles in the second step. The third step uses the generated data set to train a machine learning technique, and finally, by the definition of a control limit, the machine learning technique provides information about the process condition in the fourth step. Figure 1 shows a step-by-step flowchart regarding the proposed method in monitoring autocorrelated profiles. Moreover, details of these steps are presented in the following subsections.

Fig. 1
figure 1

The general step-by-step flowchart of the proposed method

3.1 Defining the structure of input features and outputs

Extracting proper input features is a key step in the implementation of machine learning techniques [60]. In the literature, various strategies have been employed for extracting features. For example, Chen et al. [57] and Sergin and Yan [51] used autoencoders to obtain Phase I information, which is not compatible for this study. Hosseinifard et al. [53] and Yeganeh and Shadman [54] suggested to take the OLS estimations of the linear profile parameters as input features. This input structure performs well in profiles without autocorrelation, but it is not able to provide reliable results when between-autocorrelation of the profiles is present since the effect of autocorrelation is not captured by the distribution of the OLS estimators. So, in addition to these inputs (\({\widehat{A}}_{0j}\),\({\widehat{ A}}_{1j}\), \({\widehat{\sigma }}_{j}\)), further appropriate input features are proposed in the following, which are suitable to account for between-autocorrelation effects of first order (i.e., AR(1) autocorrelation). In particular, the proposed input structure addresses the effect of the mean of responses, the mean of errors, and T2 statistic each in the current sample (j) and the previous sample (j−1). It should be mentioned that one of the main benefits of machine learning-based algorithms is their independency to the basic assumptions about the process. So, when replacing proper estimations related to other autocorrelation models such as MA, ARMA and ARIMA, the proposed method can be easily applied for other process conditions.

In machine learning, there are several approaches for normalizing and scaling of the inputs, one of which involves the parameter distribution to be used. For instance, Yeganeh and Shadman [54] scaled the parameters of simple linear profiles, i.e., intercept, slope and standard deviation, with normal and chi square distribution (the relations are not reported for brevity but the interested reader is referred to Eqs. (4) to (9) in Yeganeh and Shadman [54]). Since the autocorrelation was not considered in Yeganeh and Shadman [54] for OLS estimators, we suggest to utilise the between-autocorrelation effect of the first order by means of deviations between current (previous) OLS estimators and their respective IC values, i.e., \(\begin{aligned}{\widehat{A}}_{0j}-{A}_{0},\, {\widehat{A}}_{1j}-{A}_{1}, {\widehat{\sigma }}_{j}-{\sigma }_{0}({\widehat{A}}_{0(j-1)}-{A}_{0},\, {\widehat{A}}_{1(j-1)}-{A}_{1}, {\widehat{\sigma }}_{(j-1)}-{\sigma }_{0})\end{aligned}\). Using this approach, the means of current and previous responses (\(\overline{y}_{j} ,\overline{y}_{{\left( {j - 1} \right)}}\)) are automatically incorporated in the input structure as they are functions of the OLS estimates \(\left( {\hat{A}_{0j} = \overline{y}_{j} - \hat{A}_{1j} \overline{x},\hat{A}_{{0\left( {j - 1} \right)}} = \overline{y}_{{\left( {j - 1} \right)}} - \hat{A}_{{1\left( {j - 1} \right)}} \overline{x}} \right)\).

By considering the above inputs, numerous input combinations can be defined in the proposed method. Having investigated several input features, the following three input combinations I, II and III with 8, 4 and 10 inputs, respectively, are employed for each machine learning technique based on their ability to: (1) adequately address the specific effects of between-autocorrelation of type AR(1), and (2) incorporate the deviations between OLS estimates and their respective IC values for efficient Phase II monitoring:

  1. (I)

    \({\widehat{A}}_{0(j-1)}-{A}_{0}\), \({\widehat{A}}_{1(j-1)}-{A}_{1}\), \({\widehat{\sigma }}_{(j-1)}-{\sigma }_{0}\), \({\widehat{A}}_{0j}-{A}_{0}\), \({\widehat{A}}_{1j}-{A}_{1}\), \({\widehat{\sigma }}_{j}-{\sigma }_{0}\), \({\overline{e}}_{j}\), \({\overline{e}}_{(j-1)}\).

  2. (II)

    \({\widehat{\sigma }}_{(j-1)}-{\sigma }_{0}\), \({\widehat{\sigma }}_{j}-{\sigma }_{0}\), t2(j−1), t2j.

  3. (III)

    \({\widehat{A}}_{0(j-1)}-{A}_{0}\), \({\widehat{A}}_{1\left(j-1\right)}-{A}_{1}\), \({\widehat{\sigma }}_{(j-1)}-{\sigma }_{0}\), \({\widehat{A}}_{0j}-{A}_{0}\), \({\widehat{A}}_{1j}-{A}_{1}\), \({\widehat{\sigma }}_{j}-{\sigma }_{0}\), \({\overline{e}}_{j}\), \({\overline{e}}_{(j-1)}\), t2(j−1), t2j.

These input combinations are motivated by their ability to consider effects regarding autocorrelation in various ways. While input combination I addresses the raw AR(1) structure of the underlying model, input combination II aims to isolate the effects of the current and previous T2 statistics extended by an additional consideration of the current and previous error variances. Finally, input combination III is the union of input combinations I and II, and therefore, combines both main effects. In these three input combinations, the following notations are utilized:

  • Estimated parameters via OLS in the previous sample (\({\widehat{A}}_{0(j-1)}\), \({\widehat{A}}_{1(j-1)}\), \({\widehat{\sigma }}_{(j-1)}\)).

  • Estimated parameters via OLS in the current sample (\({\widehat{A}}_{0j}\), \({\widehat{A}}_{1j}\), \({\widehat{\sigma }}_{j}\)).

  • Mean of error terms in the previous sample (\({\overline{e}}_{(j-1)}\)).

  • Mean of error terms in the current sample (\({\overline{e}}_{j}\)).

  • T2 statistic in the previous sample (t2(j−1)).

  • T2 statistic in the current sample (t2j).

Note that the T2 statistic is added as an input feature and no further statistics from other conventional control charts in order to: (1) increase the worse performance of the T2 control chart, and (2) avoid overparameterization and complexity with regard to further common competitors.

3.2 Generation of the training data set

To construct the training data set, the IC and OC profiles are generated by means of simulations. From the simulated profiles, the inputs are constructed based on the predefined three input structures I, II and III in Sect. 3.1. For example, the inputs of the jth generated profile consist of 10 features in input combination III as \({\widehat{A}}_{0(j-1)}-{A}_{0}\), \({\widehat{A}}_{1\left(j-1\right)}-{A}_{1}\), \({\widehat{\sigma }}_{(j-1)}-{\sigma }_{0}\), \({\widehat{A}}_{0j}-{A}_{0}\), \({\widehat{A}}_{1j}-{A}_{1}\), \({\widehat{\sigma }}_{j}-{\sigma }_{0}\), \({\overline{e}}_{(j-1)}\), \({\overline{e}}_{j}, t^{2}_{(j-1)}\,{\text{and}}\,t^{2}_{j}.\) Considering the suggestions by Hosseinifard, Abdollahian and Zeephongsekul [53], equal numbers of IC and OC profiles are generated in a way that the target values of IC and OC profiles are set to 0 and 1, respectively.

Hence, we consider the size of the training data set (number of rows) as 6G. First, 3G IC profiles are generated and their input features are recorded with a target value equal to 0. Then, 3G OC profiles (G profiles with shift in intercept, G profiles with shift in slope, and G profiles with shift in standard deviation) are obtained in the same way with a target value equal to 1. Finally, the training data set has 6G rows and 9 (8 + 1), 5 (4 + 1) and 11 (10 + 1) columns for input combinations I, II and III, respectively (note that the last column represents the target values). For better understanding, pseudo code 1 illustrates the process of data set generation for input combination III (an analogous procedure also applies to input combinations I and II).

figure a

3.3 Training a machine learning technique

After obtaining a training data set, a machine learning technique can be trained on its basis. In this paper, six common machine learning techniques, i.e., ANFIS, ANN, RBF, SVR, CNN and LSTM, with the ability of generating continuous outputs are investigated. For better understanding, we provide a brief description about the parameters and adjustments of each method in MATLAB software.

  • The ANFIS approach utilizes fuzzy IF–THEN rules to train the parameters with some basic algorithms such as subtractive clustering and grid portioning, in which the idea of rule generation is different. The ‘genfis’ function, which is a well-known single-output Sugeno fuzzy inference system, is used to obtain a grid partition for the training procedure.

  • The ANN structure, especially a Multi-Layer Perceptron (MLP), with gradient-based optimization is employed here. An important issue in ANNs is related to the adjustment of the number of hidden layers and the neurons. The function ‘feedforwardnet’ utilises a fully connected network architecture using the BP Levenberg–Marquardt training algorithm (‘trainlm’ option). In this study, a single hidden layer with 10 neurons is suggested for training.

  • RBF considers different training approaches based on the idea of clustering. It has only one hidden layer in a way that the neurons with a distinct spread (radius) are added to its structure until the pre-specified error or maximum number of neurons is obtained. Because the training procedure is completed by the aim of training error reduction, the probability of overfitting would be generally high in this approach. The function ‘newrb’ with spread (radius) 1, error rate 0.05 and maximum neuron size 100 is selected for training purposes.

  • As additional machine learning techniques, SVM and SVR obtain the parameters based on generating a hyperplane in the problem space in order to minimize the gaps between the predicted and obtained values considering a kernel function for mapping the inputs to the problem space. As we aim to reach a continuous output (regression problem) in this study, the SVR function ‘fitrsvm’ with the gaussian kernel function is used for training (more details about the classification and regression nature of machine learning-based control charts can be found in Yeganeh and Shadman [54]). The parameter epsilon, which determines the distance between the real and the estimated planes in the space, is an important parameter of the SVR technique. As the common range of epsilon is [0.3–0.5] [61], the value 0.3 is selected in this paper.

  • The deep leaning technique CNN is utilized to investigate its detection ability in the SPM field. Generally, a CNN layer moves some filters along the input vertically and horizontally and computes the dot product of the weights and the input, and then adds a bias term to reach some novel features from the process. CNNs have several parameters such as padding and filter size. As the inputs of this paper are in vector form, the layers are created with ‘convolution1dLayer’ function with filter size 5.

  • As a further deep learning technique, LSTM is trained to evaluate its performance. Due to the consideration of time dependencies with the aim of time units, LSTM can identify the time series related patterns in an effective way. The most important parameter of LSTM is the number of hidden neurons in each unit that is nearly like the number of hidden layers in common ANNs. Two LSTM layers are defined by the ‘lstmLayer’ function with 40 hidden neurons in a way that the Adam optimizer is utilised to obtain the best weights.

3.4 Decision on the process condition

Considering the definition of target values, Hosseinifard et al. [53] set the LCL of their proposed method to 0 and denoted the UCL as Cutting Value (CV). The CV is adjusted by simulations to reach the desired value of ARL0. After adjustment of the CV, the output of the considered machine learning technique, e.g., ANN, is compared with the CV to make a decision on the process [43]. If the output of the ANN in the jth sample (Oj) is larger than the CV, this indicates an OC condition (see Fig. 1 in Hosseinifard et al. [53] for more details).

By employing this approach in our proposed framework, we can identify the process condition when comparing the output Oj of a machine learning technique with the respective CV. For better understanding, pseudo code 2 illustrates the procedure of reaching an OC signal in one iteration of simulations when input combination III is used. To compute the ARL and SDRL by means of Monte Carlo simulations, this procedure is iterated 10,000 times.

figure b

4 Simulation study

To show the effectiveness of our proposed method, a comprehensive simulation study is conducted in this section. To compare the six machine learning techniques ANFIS, ANNBP, CNN, LSTM, RBF and SVR, the CV for each method is set to reach ARL0 equal to 200, as it is the most common value in profile monitoring. The next aim is to find the best input among the three combinations I, II and III. As such, three input combinations under different parameter settings are used as inputs to the machine learning techniques. Due to the page limit, we only present the results of three input combinations for the shift in intercept in Table 1 under ϕ = 0.1 and other results can be given to the interested readers upon request. As can be seen, nearly all machine learning techniques produced the best (i.e., lowest) values in terms of ARL1 for input combination III. This is due to the fact that input combination III is able to combine both main effects of input combinations I and II, namely:

Table 1 ARL1 comparisons considering all shifts in the intercept (λ) for the input combinations I, II and III under ϕ = 0.1

(1) appropriately addressing the raw AR(1) structure of the underlying model, and.

(2) the effects of the current and previous T2 statistics as well as of current and previous error variances.

In other words, only considering (2), i.e., using input combination II, is not enough to reach proper results. On the other hand, concentrating on (1), i.e., using input combination I, enables to obtain better performance on average than with (2). The combination of (1) and (2) via input combination III clearly strengthens the effect of (1) and leads to superior results.

Similar results are obtained for the rest of parameter settings; thus, we only present the results of input combination III for the rest of the experiments. In addition, due to the first-priority importance of detecting small shifts in the underlying process, the focus is mainly on smaller shifts in the simulation studies. As for larger shifts, there are generally the same patterns as for smaller shifts.

For comparisons of single shifts in intercept, slope and standard deviation, the IC model is taken from Noorossana et al. [44] and Wang and Huang [48], where A0 = 3, A1 = 2 and σ2 = 1. In addition, the explanatory variables have the values 2, 4, 6, 8 (n = 4), and 0.1, 0.5, 0.9 are considered as fixed values of ϕ. For comparisons of simultaneous shifts, the IC model is extracted from Wang and Lai [45].

In Sect. 4.1, the performance of various machine learning techniques is compared and the method with the best performance is selected. In Sect. 4.2, the selected machine learning technique and conventional statistical control charts in Phase II profile monitoring are compared. Finally, Sect. 4.3 reports the performance of the best approach for simultaneous shifts in profile parameters.

4.1 Comparing different machine learning techniques based on input combination III

In this subsection, three individual shifts are considered for each parameter to compare the performance of each technique in a way that the shifted parameters are A0 + λσ, A1 + ησ and γσ. The values of ARL1 associated with different machine learning techniques are given in Table 2. To have a fair comparison, the values of the SDRL1 are additionally provided in Table 3 for each of the considered shifts. The bold values represent the approach with the best performance.

Table 2 ARL1 comparisons considering small shifts in intercept (λ), slope (η) and standard deviation (γ) for the input combination III and various values of ϕ
Table 3 SDRL1 comparisons considering small shifts in intercept (λ), slope (η) and standard deviation (γ) for the input combination III and various values of ϕ

As can be seen in Table 2, ANFIS obtains the best performance in terms of ARL1 for small shifts in intercept and slope, given ϕ = 0.1 and ϕ = 0.5. As for ϕ = 0.9, mostly ANNBP (λ = 0.6, 0.8, η = 0.075, 0.1, 0.125) and LSTM (λ = 0.2, η = 0.025, 0.05) produce better results. These results are also reflected in the values of SDRL1 with only a few exceptions (see Table 3). As for small shifts in the standard deviation, RBF outperforms other machine learning techniques, with only two exceptions for ϕ = 0.9, γ = 1.8 and 2, where CNN performs marginally better. However, RBF is not a preferable approach in comparison with ANFIS regarding shifts in intercept and slope.

Note that the deep learning techniques CNN and LSTM are generally not able to reach comparable results for the most shifts, especially for smaller shifts. These findings can be justified before the backdrop that deep neural networks have superiority over shallow networks in problems with a large number of features such as image processing and Natural Language Processing (NLP) [62, 63].

Based on the above findings and numerical evidence regarding a wider range of shifts, ANFIS is the best of the considered machine learning approaches regarding shifts in intercept and slope, and RBF is the superior method regarding shifts in the standard deviation. However, for ϕ = 0.9, ANFIS and RBF are not throughout the best methods. To illustrate this issue, we additionally provide the results regarding ARL1 and SDRL1 of all techniques considering ϕ = 0.9 with a wider range of shifts in Tables 4 and 5, respectively.

Table 4 ARL1 comparisons considering a wider range of shifts in intercept (λ), slope (η) and standard deviation (γ) for the input combination III and ϕ = 0.9
Table 5 SDRL1 comparisons considering a wider range of shifts in intercept (λ), slope (η) and standard deviation (γ) for the input combination III and ϕ = 0.9

To reach a better judgment about the results of Tables 4 and 5, the Relative Mean Index (RMI) is implemented to select the best machine learning technique. This measure is frequently utilized in SPM (see, for example, Han and Tsung [64], Perry [65] and Yeganeh et al. [24]) and considers the average difference from the superior approach in each treatment. The smaller the RMI, the quicker the detection ability. Table 6 reports the values of RMI based on the simulation results given in Tables 4 and 5.

Table 6 Comparing different machine learning techniques based on the RMI criterion with respect to simulation results given in Tables 4 and 5 (ϕ = 0.9)

The order of machine learning techniques in terms of ARL1 (SDRL1) for shifts in the intercept is RBF, ANNBP, ANFIS, SVR, LSTM, CNN (ANNBP, ANFIS, RBF, SVR, LSTM, CNN). As for shifts in the slope, the order in terms of both ARL1 and SDRL1 is ANNBP, RBF, ANFIS, SVR, LSTM, CNN, while the order for shifts in the standard deviation is CNN, ANFIS, ANNBP, RBF, LSTM, SVR for both ARL1 and SDRL1. To select the machine learning technique with the best average performance regarding all the shifts, we consider the average of the RMI values in terms of ARL1 and SDRL1, respectively. Here, ANFIS (ANNBP) has the best overall average performance because the average of RMI with respect to ARL1 (SDRL1) is 0.570 (1.598), but the difference in SDRL1 between ANNBP and ANFIS (1.598 vs. 1.931) is negligible. According to these results, ANFIS is selected as the best machine learning technique, and thus we use it as benchmark technique in the subsequent simulations.

The superiority of ANFIS could be due to the establishment of two main training parts including the antecedent part and the conclusion in its structure by the aim of fuzzy IF–THEN rules. The effects of the current and previous T2 statistics, current and previous errors as well as of the current and previous coefficients’ estimations, the chance of getting unnatural trends is increased when employing the ANFIS technique. A similar conclusion has been achieved by Aziz Kalteh and Babouei [66] in a way that they indicated the suitable performance of ANFIS in control chart pattern recognition problems.

4.2 Comparing the best machine learning technique with conventional statistical approaches

In this subsection, the performance of ANFIS as the best machine learning technique is compared with conventional statistical control charts based on individual and simultaneous shifts in intercept, slope and standard deviation for ϕ = 0.1. Because the performance of RBF regarding shifts in the standard deviation is better as compared to ANFIS, RBF is also embedded in the comparisons. Three competitors entailing T2, EWMA/R and EWMA-3 are selected following Noorossana et al. [44]. Table 7 shows the results of ARL1 for ANFIS and the competitors. Note that the setups are the same as in the previous subsection, and the results of ANFIS and RBF are extracted from Table 2.

Table 7 Comparing the performance of ANFIS and RBF with conventional statistical competitors considering small shifts in intercept (λ), slope (η) and standard deviation (γ) in terms of ARL1 (ϕ = 0.1)

As can be seen, ANFIS performs considerably better than the other methods for shifts regarding intercept and slope. There is a large difference especially for smaller shifts; for example, ARL1 is 3.096 for ANFIS given λ = 0.4, while ARL1 is 99.7, 21.9 and 19.7 for the conventional competitors T2, EWMA/R and EWMA-3, respectively. However, EWMA-3 and RBF obtain the best results for shifts in the standard deviation. In addition, the statistical methods outperform ANFIS regarding shifts in the standard deviation.

4.3 Comparing the best machine learning technique with conventional statistical approaches considering simultaneous shifts

In industrial processes, simultaneous shifts may occur, so a control chart should also be able to detect such type of shifts. Wang and Lai [45] conducted several simulations about simultaneous shifts with the IC model proposed by Noorossana, Amiri and Soleimani [44]. In the following, we compare ANFIS with two reported schemes in Wang and Lai [45], i.e., T2 and MEWMA. MEWMA is an advanced version of EWMA control charts that integrates the effect of previous samples in one statistic, and some researchers reported that the performance of this approach in profile monitoring is very well [17, 21, 59]. Table 8 shows the results in terms of ARL1 for simultaneous shifts in intercept and slope. Note that we restrict the comparison to location parameters and do not consider further simultaneous shifts, which include shifts in the standard deviation, due to the superior performance of ANFIS regarding shifts in intercept and slope. This is also in line with the approach proposed in Wang and Lai [45].

Table 8 Comparing the performance of ANFIS with conventional statistical competitors considering simultaneous small shifts in intercept (λ and slope (η) in terms of ARL1 (ϕ = 0.9)

According to Table 8, ANFIS outperforms both other methods. The deviations in terms of ARL1 are tangible; for example, the values of ARL1 are 135.73, 180.42 and 196.15 (1.91, 34.22 and 49.64) for ANFIS, MEWMA and T2, respectively, for the smallest (largest) shift λ = 0.2 and η = 0.025 (λ = 1 and η = 0.125). While there is no distinct trend for the absolute deviations between the values of ARL1 of ANFIS and each of both competitors for increasing shift sizes, there is generally an increasing behaviour for the corresponding relative deviations, i.e., \(\frac{{{\text{ARL}}_{1}^{{{\text{MEWMA}}}}- {\text{ARL}}_{1}^{{{\text{ANFIS}}}} }}{{{\text{ARL}}_{1}^{{{\text{ANFIS}}}} }}\,{\text{and}}\, \frac{{{\text{ARL}}_{1}^{{T^{2} }} - {\text{ARL}}_{1}^{{{\text{ANFIS}}}}}}{{{\text{ARL}}_{1}^{{{\text{ANFIS}}}} }}\)regarding small shifts (see Table 8). That is, the larger the shifts in slope and/or in intercept, the larger the relative deviations.

As for larger shifts in slope and/or intercept (λ>1 and η > 0.125, not tabulated due to lower relevance), the values of ARL1 regarding ANFIS decrease to a small extent, while the values of ARL1 regarding MEWMA and T2 become closer to the respective ARL1 values of ANFIS, i.e., we observe a decreasing behavior for the corresponding relative deviations regarding larger shifts. To sum up, ANFIS clearly outperforms both methods in detecting simultaneous shifts and its detection ability is especially better for lower shift sizes.

Statistical control charts usually require the fulfilment of some principal assumptions to reach the best performance, while the occurrence of complicated patterns in the manufacturing process may lead to the invalidity of some of the presumed assumptions and thus to deteriorations in their performance. In contrast, machine learning techniques encounter less challenges provided that input combinations and training procedure are defined properly. It could be concluded from the above results that the machine learning-based techniques, and especially ANFIS, perform better than conventional statistical methods when monitoring autocorrelated profiles; however, some computational effort may be required when implementing these approaches. Due to the existence of online data collection systems in real applications, big data storage and development of high technology computers, this challenge is becoming easier in a way that machine learning-based systems can automatically analyze process data to identify OC situations. To this end, the definition of proper input features, dataset development, relevant training adjustment and acceptable false alarm rates are essential tasks. These steps are usually performed as off-line modelling phase while the operation (online) phase refers to the implementation of the trained model on the online data to detect the process condition [57, 60]. By this procedure, the proposed machine learning-based approach in this paper can improve the monitoring of industrial processes in terms of OC detection ability.

5 Illustrative example

In this section, an illustrative example of a chemical process is conducted to demonstrate a real application. In fact, this example could be considered as a calibration system in the chemical industry. Sometimes, it is necessary to control a chemical process far from the laboratory with remote schemes in which some gas sensors are used as the controller. These sensors are used to monitor such a chemical process over time. Although it is a beneficial approach, it needs new calibration by changing the sensors’ adjustments as the variability of gas sensors may affect the performance of the underlying calibration model [67]. These changes may be caused by different chemical materials, process conditions, and equipment movements so their calibration should be checked over time. The approach of profile monitoring can be applied to address calibration issues and for online monitoring of the process.

For these reasons, some studies such as Mahmood et al. [68] and Nadi et al. [36] suggested to apply profile monitoring. Metal oxide (MOX) as a conductometric type of gas sensors is one of the best options due to its sensitivity, operational ease, cost efficiency, rapid response, and the capability of spotting a high number of volatiles. The authors supposed MOX as a sensor and monitored a functional relationship between the resistance (R) of the sensor (i.e., MOX) as the dependent variable and the concentrations of carbon monoxide in the sensor as the independent variable.

To monitor this functional formula, they recorded the results of sensor resistance and different concentration levels over time. Based on the recorded data, the explanatory variables are fixed at 25, 100, 125, and 150 ppm. To reach a better performance, it is suggested to change the process situation with some additives. These substances are blended to a special process to accelerate the processing ability of the polymers, improve the characteristics such as durability, stiffness, and enhance the service life. A wide range of additives such as gas, feed, anti-wear, food, fuel, antioxidant, plastic additives have been extended yet. Indeed, gas additives are usually added to the gas sensor processes to adjust the flow of gas during the experiment [67]. However, previous works showed that the relation between resistance and carbon concentration might change in the case of additive materials. To address these issues, Nadi et al. [36] investigated situations related to the before and after of adding the additive material in a way that one additive material was added to the process after time 3278; so, the IC model was extracted from the first 3278 profiles. Considering these profiles, Nadi et al. [36] considered a simple linear IC model with the autocorrelation effect as follows:

$$ \begin{aligned} &Y_{ij} = 71.741 + 0.0176X_{i} + \varepsilon_{ij} ,\\ &\varepsilon_{ij} = 0.565\varepsilon_{{i\left( {j - 1} \right)}} + a_{ij} ,a_{ij} \sim N\left( {0,0.142} \right),\,i = 1,2,3,4;\,j = 1,2, \ldots \end{aligned} $$
(13)

To show the applicability of the proposed method in monitoring the above IC model, Nadi et al. [36] utilized simulations for OC data generation (instead of using the data after the 3278th profile). Following them, we first generated five IC profiles and then continued with the OC profile generation considering a shift in the intercept until reaching an OC signal. The magnitude of the OC shift was considered as 0.15 (or 0.5σ). Table 9 shows the response variables of the generated profiles (the black and red values are IC and OC profiles, respectively).

Table 9 The IC (black) and OC (red) generated responses in the illustrative example

To specify the detection ability of ANFIS for this data set, it is trained based on the IC model in Eq. (13) and input combination III. Considering ARL0 = 200, the CV is set to 0.615. After adjustment of the CV, the generated data in Table 9 is imported to ANFIS and the output of each input is computed. Table 10 reports the input and output values for the first seven generated profiles. Hence, ANFIS only needs two OC samples to trigger a signal. The signal in the 7th sample appears because the final statistic exceeds the CV (red horizontal line in Fig. 2), so ANFIS can trigger an OC signal (O7 = 0.849 > 0.615 = CV).

Table 10 Input and output values for the first seven generated profiles applying ANFIS with input combination III (ϕ = 0.565)
Fig. 2
figure 2

The final chart statistics of the first seven random generated profiles in the illustrative example (ϕ = 0.565)

6 Conclusions

In profile monitoring, the error term often does not follow a simple structure and is affected by autocorrelations. For this reason, a novel monitoring scheme for linear autocorrelated profiles with between-autocorrelation of first order in Phase II of process monitoring has been proposed in this paper. Unlike most of the existing methods that use common statistical control charts, this paper employed various machine learning techniques, such as ANFIS, ANNBP, CNN, LSTM, RBF and SVR as a control chart. To this aim, four main steps were defined. In the first step, the structure of the input features and outputs of a machine learning technique were determined, and then, a training data set based on the input features and outputs was generated by simulating IC and OC profiles in the second step. The third step utilized the generated data set to train a machine learning technique, and finally, by the definition of a control limit, the machine learning technique provided information about the process condition in the fourth step.

The study conducted pursued three main objectives. Due to the high importance of input features in machine learning, some input features, which are appropriate to account for between-autocorrelation effects of first order, were defined and compared to achieve the most appropriate input combination. The results indicated that input combination III, which is defined as the union of input combinations I and II and combines both main effects of these input combinations, is the most appropriate one. For the second aim, different machine learning techniques were compared to identify the most adequate one. Experimental studies showed that ANNBP, CNN, LSTM and SVR were mostly not able to reach a satisfactory detection ability in comparison with ANFIS and RBF. Among ANFIS and RBF, ANFIS was preferable with respect to shifts in intercept and slope, while RBF had the best performance regarding shifts in the standard deviation. This superiority was obvious for low and moderate autocorrelation coefficients (i.e., ϕ = 0.1 and 0.5), while it was not possible to identify a consistently best method for a larger value (ϕ = 0.9). To address this issue, we additionally implemented an overall performance measure, called RMI. Following RMI, we found that ANFIS turns out to be the method with the best overall average performance for ϕ = 0.9. The third aim of this study was to compare machine learning-based techniques with statistical control charts. This comparison led to the result that the detection ability of ANFIS outperformed all the competitors regarding shifts in intercept and slope. However, the detection ability of ANFIS regarding shifts in the standard deviation was inferior compared to the selected statistical control charts. In this regard, the EWMA-3 control chart performed better, and the best machine learning technique for this purpose was RBF (with a performance that is hardly worse than that of EWMA-3). Hence, machine learning-based control charts, and ANFIS in the first place, are suggested to be utilized in profiles that are characterized by between-sample AR(1) autocorrelation to considerably improve the detection ability of the control chart.

Employing the proposed novel input features with other machine learning techniques and other profile types such as nonlinear or Generalized Linear Models (GLMs) in the presence of autocorrelations could be a promising avenue for potential future research. Also, implementing the proposed method in profiles that are characterized by within sample autocorrelation or in profiles with other autocorrelation patterns, such as ARMA or VARMA are further suggestions for potential future directions.