Control chart based on residues: Is a good methodology to detect outliers?

The purpose of this article is to evaluate the application of forecasting models along with the use of residual control charts to assess production processes whose samples have autocorrelation characteristics. The main objective is to determine the efficiency of control charts for individual observations (CCIO) and exponentially weighted moving average (EWMA) charts when they are applied to residuals of models of AR(1) or MA(1) to detect outlier in autocorrelated processes. Considering autocorrelation strength and sign in the data series and the outlier range, the series were simulated accomplishing 640,000 sets. The series were contaminated by anomalous observations at 100th position, an AR(1) or MA(1) model were fitted, and the residuals were evaluated by CCIO and EWMA control charts; the points correctly detected as an autocorrelation were recorded. For the parameters investigated (autocorrelation and outlier range), a detection rate was generated in each chart, and nonparametric comparison tests were applied. The result of the tests showed the superiority (p < 0.05) of the CCIO chart for both models. The study of the influence of the sign and magnitude of the autocorrelation parameter showed no significant (p > 0.05) for either AR(1) or MA(1) charts and models. In this context, control charts for individual observations (CCIO) were confirmed to effectively detect outliers through residuals in industrial autocorrelated processes originated in first-order AR and MA models.


Introduction
The statistical process control (SPC) methodology is used by companies and industries to evaluate the specification standards pattern and the process capability based on control charts, in order to promote the continuous improvement (He and Wang 2018;Woodall 2000;Peres and Fogliatto 2018). The justification and importance of studying this theme is revealed due to its incipient growth in terms of articles deposited in the Web of Science journal repository but not less important to industries. The growth of the scientific production of the researched subject is shown in Fig. 1. Figure 1 shows that we find only 188 works in the most diverse areas of knowledge. Just 28 works are from the industrial engineering area, published from 1992 to 2017. This shows that the theme "statistical process control" can still be explored, existing is a literature gap. For Montgomery (2004), statistical quality control techniques can be applied in industrial and non-industrial processes, in order to detect and correct the anomalies in the process to meet the specified target of a given product. For Duarte and Saraiva (2008), Toledo et al. (2008) and Bouslah et al. (2018), the SPC is an important management tool for improving quality and increasing the productivity of industrial processes, used to reduce the variability process.
SPC uses different evaluation techniques applied to autocorrelated data, such as time series analysis, regression analysis and artificial neural networks and support vector regression (Du and Lv 2013;Du and Zhang 2016). We notice that 1 3 research in the case of outlier detection in correlated processes using neural networks is rare in the literature. Lalor and Zhang (2001) presented a multivariate case to detect three types of outliers: range, spatial and relationship outliers. They showed that neural networks are a free and effective modelling to detect irregular data. Guo and Xue (2012) used statistical methods to detect outliers and treat these discrepancies to later carry out training and extrapolations through neural networks. The conclusion was that statistical methods and neural networks promote analysis and prediction more assertively.
In order to evaluate water quality control, methodologies such as artificial neural networks (ANN), principal component analysis (PCA) and universal process modelling (UPM) were used by Cancilla and Fang (1996) to characterize quality variables of the Niagara River. In this research, the methodologies such as PCA and UPM were able to capture outliers and ANN was used to carry out the predictions.
The basic quality control activities relay on reducing the quality variations, so the variance magnitude can be estimated and the potential variability can be identified, resulting in the minimization of potential losses (Christino et al. 2010). In relation to the control charts, these were created with the purpose of monitoring the process, by observing the average and standard deviation as highlighted by McCracken and Chakraborti (2013) and Trafimow et al. (2018). However, the relevant literature in statistical shows that other control charts could be used to control quality and productivity Abbas 2018).
Statistical process control charts were used as process monitoring tool, in order to evaluate the effect of the regulation of the rotary separation system on the losses observed during the mechanized harvest. Cunha et al. (2014) observed that the control charts were effective tools in detecting the total losses of the industrial tomato crop, because it was detected that the process was out of control and that the adoption of corrections in the process would provide a better harvesting efficiency (Cunha et al. 2014). Voltarelli et al. (2015) used the control charts as an indicator of quality in the monitoring of losses in mechanized harvesting of sugarcane in the Triângulo Mineiro region, in the State of Minas Gerais, Brazil. The control charts for individual values and for exponentially weighted moving averages were used. However, the control chart for individual values was one that had a good response in the monitoring of losses in mechanized harvesting of sugarcane.
The simultaneous monitoring process using a single nonparametric control chart makes it easy for process controllers to use, because it allows to identify the presence of special causes in the process and does not hurt the assumption of normality present in the traditional control charts (McCracken and Chakraborti 2013).  12 11 10 9 8 7 6 5 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Fig. 1 Number of publications on the topic statistical process control in the Web of Science Dumičić and Žmuk (2015) used exponentially weighted moving average (EWMA) and cumulative sum (CUSUM) charts to support decision making in the stock market trading process. The EWMA and CUSUM control charts were used for the purpose of pointing out stock purchase and sale signals. The control charts did not perform satisfactorily with the presence of anomalies such as non-normality and autocorrelation, which impacted the performance of the control charts. The importance of control charts is also highlighted by Aparisi et al. (2018), demonstrating that the quality process has improved a lot in relation to previous decades, and the majority of the samples collected in high-quality processes do not defective units. Silva et al. (2017) have developed a strategy for continuous monitoring from multivariate statistical process control in the Consigua™25 tablet manufacturing line. This strategy was based on the technique of principal component analysis, while the impact of the deviations imposed on the continuity of the process was evaluated from the residues through the Hotelling T 2 control charts. The results evidenced that the residues imposed were detected in the control chart, thus showing effectiveness in its application, mainly in monitoring the temperature control of the granulator drum.
Durmusoglu (2018) proposed an approach to detect abnormal deviations by updating the methodology of time series forecasting models using control charts. Its purpose was to monitor the residues (difference between actual and expected values of the interest variable) in a continuous way using the control charts, in order to ascertain whether such residues were outside the established limits, which would be an alert or a process change.
Control charts are usually planned and evaluated assuming that consecutive observations of the process are independent and identically distributed (i.i.d.). The observations must also meet the assumptions of normality and homoscedasticity (constant variance), and they cannot show autocorrelation characteristics. However, Montgomery (2004), Montgomery and Runger (2003) and Claro et al. (2007) emphasize that although the independence among the observations is the most important assumption, it is often violated in practice. This is due to the fact that, in general, manufacturing processes are governed by inertial elements, and when the interval between observations become small compared to these forces, they become correlated over time.
Failure to meet these assumptions when applying control charts can result in a significant increase in false alarms, an unwanted factor that not only increases control costs but also leads to wrong conclusions and causes the operator to lose credibility as a consequence (Costa et al. 2004;Del Castillo 2002). So, the alternative used in an autocorrelated process is to fit an ARIMA model and use the residues produced for this model to evaluate the process (Veiga et al. 2016;Kalavani et al. 2019).
However, it should be noted that the residuals produced by the models must be approximately normal and independent with zero mean and constant variance, fully satisfying the assumptions on the proper use of control charts. This procedure is one of the major alternatives to avoid problems caused by the violation of assumptions of no correlation among the observations (Del Castillo 2002).
The aim of this research is to use the statistical process control (SPC) technique applied to univariate time series in order to determine the efficiency of control charts for individual observations (CCIO) and the exponentially weighted moving average (EWMA) chart to detect outliers in autocorrelated processes (Miranda 2001;Santos and Barreto 2018). The charts will be applied in residuals originated from an autoregressive (AR) or moving average (MA) process. In addition to demonstrating the influence of the autocorrelation strength of the process, this study also verified the detection power of the charts in relation to the magnitude of the anomalous observation.
It is evident that the detection of outliers and their treatment is important to obtain a better fit and a smaller prediction error, because the forecast models are based on the assumption of outliers pretreatment (Veiga et al. 2010;Bashiri and Moslemi 2013;Puchalski et al. 2018). Ghomi and Sogandi (2018) showed that many actual production processes are contaminated by a continuos stream in correlated data. So, it is important to distinguish trends, seasonalities and outliers.
These outliers can often go unnoticed in residual control charts which originate in autocorrelated processes, because, according to Chang (1982), the mathematical model used to remove the serial correlation can incorporate the behaviour of the outlier in its structure, reducing its effect in the residual series and thus hindering outlier detection. It should also be noted that the presence of outliers enhances the control limits because the variability of the process is increased.
Outliers and structural changes are often found in time series analysis, and they may be associated with unexpected or uncontrollable events. Such different observations may compromise the usual methods of time series analysis (Miranda 2001). The presence of an outlier can seriously bias the least squares estimates of the parameters of an ARMA model. Palma (1999), and Rounaghi and Zadeh (2016) explain that studies on outliers in time series are relatively scarce when compared to studies in the field of linear regression. This is due to the multiplicity of the ARIMA models-AR(p), MA(q), ARMA(p, q), ARIMA(p, d, q)which requires various detection mechanisms (Palma (1999).
This research proposes a methodology to join the statistical process control and engineering process to identify outliers in autocorrelated data, using control charts applied to residuals from an autoregressive and moving average models.

Methodology
The methodological steps suggested below were used to test the efficiency of residual control charts in outlier detection in autocorrelated processes, as well as the variables that influence such efficiency, represented in this research by outlier range and the autocorrelation coefficient.

Database
To accomplish the main purpose of this research, autoregressive (AR) and moving average (MA) processes are generated under the following restrictions, and eight autocorrelation parameters (± 0.5, ± 0.6, ± 0.7 and ± 0.8) and eight outlier ranges were combined in each model. They were inserted at position 100th of the series of simulated data, considering the following ranges of deviations (1σ; 1.5σ; 2σ; 2.5σ; 3σ; 3.5σ; 4σ; and 4.5σ), yielding a total of 64 possible combinations. In total, 10,000 series were simulated for each combination of autocorrelation parameters and outlier range in order to make the performance of the control chart more robust and, thus, determine the percentage of outlier detection. Hence, 640,000 series were simulated for each model-AR(1) and MA(1). Figure 2 clearly shows a flow chart comprising the twelve steps that were followed in this research and the decisions that were taken.

Methodological steps
The following (Chart 1) are the twelve steps (S1-S12): Chart 1 Twelve steps (S1-S12) S1 Simulate a time series with 200 observations with μ_0 = 0 and constant variance, through a data generating process, such as AR(1) or MA(1). The parameters (φ or θ) must have positive and negative autocorrelations, with a magnitude of 0.5, 0.6, 0.7 and 0.8 S2 The series through autoregressive AR(1) and MA(1) are fitted using the Box and Jenkins (1970)

S3
The residual are checked whether they meet the assumptions required to be used by control charts through diagnostic tests (Box et al. 1994;Werner and Ribeiro 2003) S4 The residuals will be validated by means of a control chart for individual observations (CCIO) and a moving range (MR) chart (Montgomery 2004;Montgomery and Runger 2003;Claro et al. 2007;Costa et al. 2004), using residuals obtained from the AR and MA processes. If the residual series does not have any points outside the control limits, or if no particular pattern is identified, the simulated series will be considered stable and used for research purposes. If the residuals are not statistically under control, the simulated series will be discarded and considered not valid for the study S5 After validation step, an outlier was introduced at position 100th of the simulated data series. The outlier range should vary in the following magnitudes: 1σ; 1.5σ; 2σ; 2.5σ; 3σ; 3.5σ; 4σ; 4.5σ where σ represents the standard deviation obtained from the original data series. To avoid a biased analysis, it should be noted that if the outlier has a positive value in the observation of the original series to be introduced, the value of the outlier to be inserted should be also positive, keeping the same movement of the series. If the original observation is negative, the outlier to be introduced will be also negative S6 Introduced the outlier, the series again will be fitted in order to eliminate the autocorrelation effect (Fava 2000) and obtain the residuals that meet the conditions to apply control charts S7 Construct CCIO and EWMA control charts with the residuals of the original series contaminated by the outlier. In the EWMA chart, parameters λ = 0.2 and L = 2.86 must be used because, according to Montgomery (2004), these parameters represent values of average run length (ARL) equal to 370, a similar value to the ARL of the CCIO chart. Thus, an effective comparison can be made between the two types of control charts; Box and Luceño (1997) also suggest such parameters S8 Check whether the observation marked with the outlier is outside the control limits, which is detected by the control charts S9 Record outlier detection by the control chart in order to compute the detection rate S10 Develop steps S1 to S9 until 10,000 series are obtained for each combination of autocorrelation parameter (φ or θ) and outlier range S11 After obtaining the 10,000 series for each combination, determine the detection rates and present the results for each control chart. S12 Answer the following questions based on the simulations and the detection rate of the charts (i) What residual control chart-CCIO or EWMA-is more efficient in detecting an outlier with variable range? (ii) Is there a significant difference between positive and negative autocorrelation parameters for each chart? (iii) Is there any significant influence of the autocorrelation value on the detection power of the charts for each outlier range?
The series were simulated by means of free statistical software R-Project from steps S1 to S10, and the filters AR(p) and MA(q) were fitted by the Statistica 7.0 package as well as the CC and nonparametric tests.

Results and discussion
This section is divided into two subsections: the first one will discuss the autoregressive models and the second subsection will introduce the moving average model, for both models.

Autoregressive model AR(1)
The series with different magnitudes of outliers inserted in each predetermined position were fitted again, and the residuals coming from the model were evaluated applying the control charts. Thus, the residual series analysis of the model verified whether the charts were effective in detecting the outliers previously inserted in the original series. Table 1 shows the rate of outlier detection in an AR(1) process by the CCIO control chart. The values were arranged as a function of the variation of the autocorrelation parameter φ and the outlier range. The data in Table 1 indicate, for example, that for a parameter whose autocorrelation is 0.5 and an outlier with range of 1σ, the detection power is 0.0449; that is, the outlier was detected in 449 of the 10,000 simulated series for the control chart-CCIO, in this case.
Data visual analysis on Table 1 shows that the autocorrelation has affected the detection of the outlier for the CCIO control chart except for small outlier ranges, 1σ and 1.5σ, where the detection rate did not have significant variation with the autocorrelation magnitude. The reduction in detection power as the autocorrelation parameter value increases is best viewed in Fig. 3, which displays a chart of detection efficiency as a function of the autocorrelation parameter for each outlier range.
For moderate positive autocorrelation whose order is φ = 0.5, CCIO charts were observed to have a higher detection efficiency, with a decrease in value as the autocorrelation changed from moderate to strong. The same occurred in negative autocorrelations, where for moderate values (φ = − 0.5), efficiency was higher in comparison with the value of φ = − 0.8, which represents a strong autocorrelation.
This behaviour can be explained by the fact that, after the outlier is inserted in the simulated series, it is fitted again to obtain the residuals. Thus, the residual corresponds to the difference between the original observation and the past observation multiplied by the autocorrelation parameter. The larger the parameter of autocorrelation, the smaller the difference and consequently the smaller the residual, with a residual of lesser magnitude being hardly detected by the control charts.
Comparing the detection rate between the positive and negative autocorrelations, it can be seen that the CCIO chart is more efficient for positive autocorrelation values, since the detection rates in these cases are slightly higher compared to the negative parameters for each outlier range. Table 2 shows outlier detection rate in an AR(1) process by the EWMA control chart, with parameters λ = 0.2/L = 2.86. The values were arranged as a function of the variation of the autocorrelation parameter φ and the outlier range. The data in Table 2, with λ = 0.2 and L = 2.86, are illustrated in Fig. 4, where an efficiency detection chart was 1 3 designed for each range as a function of the autocorrelation parameter. Figure 4 shows a distinct behaviour in outlier detection by the EWMA control chart, when the parameters of positive and negative autocorrelations are compared. It can be seen that for small outlier ranges, 1σ and 1.5σ, the detection rate is not significant in relation to the magnitude of the autocorrelation. This behaviour is similar to what happened in the CCIO chart. For the other ranges with positive autocorrelations, an increased detection power is observed as the correlation changes from moderate to strong. This behaviour does Fig. 2 Flow chart of the methodological steps of the study not occur in the CCIO chart, as seen previously. Moreover, the behaviour is reversed for negative autocorrelations; that is, as the strength of the autocorrelation increases, the detection power decreases.
The data in Table 4 and the scales of the charts displayed in Fig. 4 show a great difference in the detection rate between the positive and negative parameters for the same range of differences. The same effect was seen in the CCIO chart, but the magnitude of the difference had a much lower value.

Moving average model: MA(1)
The methodological steps used in the moving average model are similar to those used in the autoregressive models to obtain the detection rates of the CCIO and EWMA control charts, when varying the autocorrelation parameter and the outlier range. Table 3 displays outlier detection rate in MA(1) processes by the control chart CCIO. The values were arranged as a function of the variation of the autocorrelation parameter and the outlier range. We can see in Table 3 that for an autocorrelation parameter of 0.5 and an outlier with range of 1σ, the detection power was 0.0407; that is, the outlier was detected by the control charts in 407 of the 10,000 simulated series.    Table 3 shows that the autocorrelation affects the outlier degree detection by the CCIO control chart, except for small outlier ranges (1σ and 1,5σ), where the detection rate shows hardly any variation with the magnitude of the autocorrelation.
For moderate positive autocorrelation whose order is θ = 0.5, CCIO charts are considered to be more efficient in the detection when the outlier range varies between 2.5σ and 4.5σ, with a reduction in its value as the strength of the autocorrelation increases. The same occurs in negative autocorrelations, where for moderate values (θ = − 0.5), efficiency is higher in comparison with the value θ = − 0.8, which represents a strongly negative autocorrelated process. This behaviour is shown in Fig. 5, which illustrates a chart of detection efficiency as a function of the autocorrelation parameter for each outlier range.
The comparison of the detection rate between the positive and negative autocorrelations discloses that the CCIO chart is slightly more efficient for positive autocorrelation values, except for the range of 1σ. In this magnitude, a significant difference in the detection rate between the positive and negative autocorrelations cannot be said to exist without a detailed statistical analysis. Table 4 displays outlier detection rate in MA(1) processes by the EWMA control chart, with parameters λ = 0.2/L = 2.86. The values were arranged as a function of the variation of the autocorrelation parameter θ and the outlier range.
The data in Table 4 are displayed in Fig. 6, where an efficiency detection chart was designed for each outlier range, as a function of the autocorrelation parameter. The analysis of Fig. 6 shows that for the positive autocorrelations there is not a clear trend of the detection rate behaviour as the strength of the autocorrelation is increased. As mentioned above for the AR model, a possible cause may be the heterogeneity of the residuals, which affects the design of the EWMA control chart. For all the outlier ranges, the detection percentage for the negative autocorrelations had a decline when the autocorrelation ranged from − 0.5 to − 0.8.
The observation of the data in Table 4 and the scales of the charts in Fig. 6 enable the perception of a great difference in the detection rate between the positive and negative autocorrelation parameters for the same ranges of differences. The same effect was seen in the CCIO chart, but the magnitude of the difference had a much lower value.

Discussion
In this subsection, the data obtained in Tables 1, 2, 3 and 4 will be analysed by means of nonparametric statistics in order to provide a statistical basis to answer the questions listed below and to prove some behaviours that were visually observed in the data tables. This analysis aims to answer the questions made in step S12, indicating which chart is more efficient considering all restrictions imposed in this study. To answer the questions asked in step S12, Table 5 shows the results of nonparametric statistical tests applied. A comparison of the results obtained for the AR(1) and MA(1) models shows that the behaviour concerning the efficiency in detecting an outlier previously inserted in these models is the same. When the CCIO and the EWMA charts were compared to check whether there is a significant difference in their detection power, the null hypothesis of equality between the two samples was rejected at a significance level of 5% in both models. Therefore, there is a significant   difference between the two charts regarding the power of outlier detection; the CCIO chart has greater efficiency than the EWMA chart.
A comparison as to whether there is a difference between positive and negative autocorrelation parameters for both charts (CCIO and EWMA) and both models (AR and MA) shows that the behaviour between the positive and negative autocorrelation parameters is different in each chart, with the EWMA chart having a greater difference in detection. The value of weighting constant λ influences the performance of the CICO and EWMA chats, as greater as the value of the λ the chart is able to capture large discrepancies, and the EWMA chart behave as a CICO chart. Smaller values to the constantan λ are able to capture short discrepancies. So, a compromise between the number of standard deviations from the central line (L) and the weighting constant (λ) must be chosen as a way to compare the average runs length in the CICO and EWMA charts. When a comparison was made as to whether there is a difference in the detection power between the different autocorrelation values on both models and charts, the null hypothesis of equality of the samples was not rejected at a significance level of 5%. Thus, there is no difference in outlier detection when there is variation in the correlation parameter value.

Conclusions
This article aimed to evaluate the application of forecasting models along with the use of residual control charts to assess production processes whose samples have autocorrelation characteristics. The main objective was to determine the efficiency of control charts for individual observations (CCIO) and exponentially weighted moving average (EWMA) charts when they are applied to residuals of models of AR(1) or MA(1) to detect outlier in autocorrelated processes. Results showed that for both the AR and MA models, the CCIO control chart is more efficient than the EWMA chart in the detection of outliers, even for low ranges. As shown by Montgomery (2004), the EWMA control chart is more effective in detecting small permanent process changes ranging from 1.5σ to 2σ, while the CCIO chart is more efficient in detecting major changes at the process level. However, as observed in this study, when the intention is to detect an outlier or make an abrupt change to a process (represented by the change of only a sample of such process), the CCIO chart is more efficient for small and large ranges. One possible reason for the poor performance of the EWMA chart in detecting an outlier by means of the residuals lies in the low weight given to the current residual by the weighting constant λ. Thus, the effect of the outlier is "masked" when developing EWMA statistics to be plotted on the control chart. A spike in the series of original data will result in a disruption in the EWMA statistics, but in the subsequent periods, the interference of the outlier disappears. However, when there is a change in the average of the process, the EWMA statistics tends to increase in the subsequent periods until it extrapolates the control limits. For this reason, the EWMA control chart is more appropriate to detect changes on average, but not recommended to detect outliers.
When an evaluation was made to check whether there is a significant difference in detection between autocorrelations with positive and negative parameters for models AR(1) and MA(1), both CCIO and EWMA control charts showed different behaviours when the parameters of positive and negative autocorrelations were different in each chart, and there is a greater difference in the EWMA chart, since its p value is closer to the rejection region of the null hypothesis of the test. Although the weights of the autocorrelations do not significantly affect detection rates, the visual analysis of the data enables the observation of a small variation in the CCIO charts and a more significant change in the EWMA charts. Another observed behaviour is related to the detection power of the EWMA chart, when positive parameters are used in AR(1) processes. In all the studied cases, for both positive and negative autocorrelations, detection decreases as the autocorrelation strength increases from 0.5 to 0.8. However, in the case mentioned, the behaviour is the opposite, with an increase in the detection efficiency when the autocorrelation parameters vary from 0.5 to 0.8. As suggestion for future research is adopting the mixed model, autoregressive integrated moving average (ARIMA) is used to extract the residuals of an autocorrelated process and, thus, study the performance of the control charts applied in this research, and test whether heteroscedasticity can affect the design of the EWMA control chart.