Introduction

The process monitoring is a critical task in all industrial plant. It can be realized by the use of three principal approaches (Venkatasubramanian et al. 2003): (1) the analytical methods based on mathematics models. These methods compare the real-system outputs to the mathematical model outputs, (2) the methods based on knowledge (Stamatis 2003; Dhillon 2005) that use the human knowledge [risk analysis, failures modes effects and critically analysis (FMECA), decision trees], and (3) the data-based methods that focus on statistic development of the process. The last kind of the method uses, generally, the control charts [(Page 1954), cumulative SUM (CUSUM) (Roberts 1959)] or exponentially weighted moving average (EWMA) (Alt et al. 1985) for the fault detection in the industrial process.

Currently, the manufacturing processes become more and more complex and multivariate. In these systems, the operator recuperates a vast data amount to be analysed. The high volume of data and the big number of process variables make the operator task fastidious. To avoid such problems, the data-based methods are more suitable for the process monitoring. The multivariate control charts [Hotelling \({T}^{2}\) control chart, multivariate CUSUM (MCUSUM), multivariate EWMA (MEWMA)] have been used for the control of multivariate process and have proved their adequacy to reduce the complexity of such process monitoring. Moreover, the monitoring of a multivariate process is a complex task, and it can be devised into four subtasks which are: the detection of abnormal situation, the diagnosis of the faults, the identification of variables that involved in the faults and finally the reconfiguration of the process (Venkatasubramanian et al. 2003).

Many researches have used the control charts for process monitoring (Yu-Chang et al. 2015; Xia 2015; Ehsan and Sadigh 2014; Vijayababu and Rukmini 2014; Assareh et al. 2013). To identify the variables that make an out-of-control in \({T}^{2}\), a decomposition of the statistic \({T}^{2}\) into independent terms has been suggested by Jing et al. (2008). The “MYT approach” has been applied by Mani and Cooper (1999) for the variables identification. The “MYT approach” has a big disadvantage which is the number of \({T}^{2}\) decompositions. For a process with p variables, the number of decompositions is p!. To reduce this number and to identify the relationship among the variables, the Bayesian networks have been applied for variables identification by Friedman (2000), Li et al. (2006), Li and Shi (2007), Sylvain (2007).

In this paper, we regroup all the tasks of the multivariate process monitoring in one approach. Our contribution is to determine the best combination of multivariate control charts, neural networks, Bayesian networks, expert systems. The result of this research is a multi-agent system that applied to a multivariate process monitoring. This multi-agent system uses: multivariate control chart for abnormal detection, neural network for faults diagnosis, Bayesian network for variables identification and expert system for reconfiguration task.

The rest of this paper is organized as follows: the process monitoring approach is presented in “The proposed multi-agent system” section with the monitoring algorithm. In “Application of the proposed model on the Tennessee Eastman process” section, a case study of simulated Tennessee Eastman process (TEP) (Downs and Vogel 1993) is employed to illustrate the validity of the proposed approach, including the detection by multivariate control charts executor agent (MCCEA), diagnosis by diagnosis artificial neural network agent (DANNA), identification by Identification Bayesian network agent (IBNA) and the reconfiguration by reconfiguration agent (RA). Finally, conclusions and future works are suggested.

The proposed multi-agent system

The proposed multi-agent system uses a multiple intelligences that are: multivariate control chart, neural network, Bayesian network and expert system in a multi-agent system. The multivariate control charts (\({T}^{2}\) control chart, MEWMA ...) can detect successfully the instability of the process, but it cannot diagnosis the fault that appeared in the process and cannot identify the causes of the instability. In this paper, we use an artificial neural network for the faults diagnosis. The neural networks have demonstrated their ability in the classification of similar faults. The neural networks take time in the training phase, and then, the classification will be done quickly. After detecting the instability using \({T}^{2}\) control chart, and the diagnosis using neural network, the Bayesian network proposed by Sylvain (2007) is used in the identification task. To realize a complete monitoring system for multivariate process and simplify the reconfiguration task to the operators that are not specializing in the realm, we developed an expert system that assures the process correction. The following paragraphs will describe each of these used agents. The agent diagram of the proposed approach is shown in Fig. 1. In this diagram, the actual agent types are represented by circles. People that must interact with the system are represented by the unified modelling language (UML) actor symbol.

Fig. 1
figure 1

The agents diagram

The interface agent

The interface agent (IA) is a reactive agent which represents the interface for the human user access; hence, it receives the request from the users (monitoring the process state). Besides this, the IA transforms the agent’s responses to the users. The IA receives a request from the user about the process state, and it sends a message to the MCCEA. If the process is under control, the IA will display to the operator the decision of the MCCEA. In the other case, when the process is out of control, the IA waits the response from the RA and displays it to the user.

The multivariate control chart executor agent

This agent is responsible on the execution of the multivariate control charts [\({T}^{2}\) control chart (Hotelling 1947), multivariate CUSUM (MCUSUM) (Pignatiello and Runger 1990), multivariate EWMA (MEWMA) (Lowry et al. 1992)]. The control charts (\({T}^{2}\)) control chart, MEWMA and MCUSUM can successfully detect the process instability, but it cannot give any information about the fault that appeared in the process and the variables that are responsible about the process instability. The use of one chart for process monitoring is not sufficient to detect all out of control situation. So, to monitor successively the process, we suggest to use a software agent that can execute simultaneously a set of multivariate control charts and detect easily the process instability. These different control charts are utilized in the design and implementation of the MCCEA.

The diagnosis artificial neural network agent

We use the neural networks in the diagnosis task because it demonstrated its efficiency in the resolution of classification problem. In addition, the neural networks—after the learning step—has a short response time and a good classification rate. We create a classical multilayer perceptron (MLP), with three layers: (1) the input layer: the number of neurons in this layer is the number of the process parameters, (2) the output layer: in this layer, the number of the neurons represents the number of classes (faults of the process), (3) the hidden layer: it is generally known that the number of neurons in this layer is problematic research. We carried out a set of tests, and we find that the optimal number is equal to: (number of neurons in the input layer + the number of neurons in the output layer)/2. This neural network is used in the implementation of the DANNA. So, in our system the DANNA is responsible for the diagnosis task. When the process is out of control, DANNA receives report from the MCCEA. Its principle objective is to find the fault that appeared in the process. After, it sends a report to the IBNA.

The identification Bayesian network agent

The IBNA receives report from DANNA about the fault that appeared in the process. It builds a Bayesian net using the causal decomposition algorithm of \({T}^{2}\) proposed by Sylvain (2007). It finds the variables involved in the fault. This agent simplifies the variable identification in the process. After, it sends report to the RA.

The reconfiguration agent

For the objective, to regroup all the process monitoring tasks (detection, diagnosis, identification and reconfiguration) in one system, we add the RA which helps the operator to reconfigure the process after its failure. It receives report from the IBNA about the variables that involved in the fault. It must propose a reconfiguration plan to the operator, to maintain the process. Also, it sends its reconfiguration plan to the IA. This agent has been developed using an expert system technology.

The proposed monitoring algorithm

Start

Get data from data base

Create the MCCEA

MCCEA runs the controls charts

If(MCCEA-decision=stable-process)Then

MCCEA sends report to the IA

Else

Create the DANNA

Create the IBNA

Create RA

DANNA creates the ANN using MLP

IBNA creates the Bayesian net

For (i=1 to number of observations) Do

DANNA gives its diagnosis of the observation i

DANNA sends the diagnosis to the IBNA

End For

IBNA receives the diagnosis from DANNA

IBNA uses BN to find the variables that are out of control

IBNA sends the report to the RA

RA receives report about the variables involved in the fault

RA finds the reconfiguration plan

RA sends report to the IA

IA receives report from RA

End If

End

Application of the proposed model on the Tennessee Eastman process

Introduction to the Tennessee Eastman process

The Tennessee Eastman process (TEP) is proposed by Downs and Vogel (1993) to provide a simulated model and to evaluate the monitoring methods of industrial complex process. The process consists of five principal units: a condenser, a separator, a reactor, a compressor and a stripper. Four gaseous reactants (A, C, D and E) and inert B are fed to the reactor. It produces two components (G and H) and the undesired by-product F. The reaction equations are listed in equation number (1–4). All the reactions are irreversible, exothermic and approximately first order with respect to the reactant concentrations. The reaction rates are expressed as Arrhenius function of temperature. The reaction producing G has higher activation energy than that producing H, thus resulting in more sensitivity to temperature (Fig. 2).

The TEP process proposed by Downs and Vogel (1993) is open loop unstable, and it should be operated under closed loop. In this article, we use this control structure to evaluate the performance of our approach on fault diagnosis. The reactor product stream is cooled through a condenser and fed to a vapour–liquid separator. The vapour exits the separator and recycles to the reactor feed through a compressor. A portion of the recycle stream is purged to prevent the inert and by-product from accumulating. The condensed component from the separator is sent to a stripper, which is used to strip the remaining reactants. Once G and H exit the base of the stripper, they are sent to a downstream process which is not included in the diagram. The inert and by-products are finally purged as vapour from vapour–liquid separator. The process provides 41 measured and 12 manipulated variables, denoted as XMEAS(1) to XMEAS(41) and XMV(1) to XMV(12), respectively. Their brief descriptions and units are listed in Tables 1 and 2. Fifteen preprogrammed faults IDV(1) to IDV(15) of TEP are given to represent different conditions of the process operation, as listed in Table 3.

Fig. 2
figure 2

Tennessee Eastman control problem

$$\begin{aligned} A(g)+C(g)+D(g) \longrightarrow G(l) \end{aligned}$$
(1)
$$\begin{aligned} A(g)+C(g)+E(g) \longrightarrow H(l) \end{aligned}$$
(2)
$$\begin{aligned} A(g)+E(g) \longrightarrow F(l) \end{aligned}$$
(3)
$$\begin{aligned} 3D(g) \longrightarrow 2F(l) \end{aligned}$$
(4)
Table 1 Measurement variables in the Tennessee Eastman process
Table 2 Manipulated variables in the Tennessee Eastman process
Table 3 The known faults of the Tennessee Eastman process

Simulation and results analyses

The proposed approach has been implemented using the Java environment Netbeans IDE. Also, we use the agent design platform Java Agent Development framework JADE. To simplify the development of the neural network and Bayesian network with Netbeans, java offers many libraries. Moreover, we use Jess Tab which is a rule engine for the Java platform to produce our rules in the knowledge base. In this work, we use FIPA Agent Communication specifications that deal with Agent Communication Language (ACL) messages , message exchange interaction protocols and content language representations.

In this section, we evaluate the performances of the proposed approach on concrete example which is the TEP process. The used data represent 480 observations training for each fault and 800 tests for each faults, in addition to the normal period. The observations of training have been obtained with the simulation of each fault in a period of 24 h; moreover, the observations of the test set have been obtained in a period of 40 h. Variables are sampled every 3 min.

  • The Detection

    All the persons that are worked on TEP take rate to obtain wrong alarm equal to \(0.01\%\). In this work, we use the \({T}^{2}\) control chart for instability detection. A performance of detection system is evaluated by calculating its reliability (Kononenko 1991). The detection reliability is defined as: (the number of obtained alerts in the test period/the total number of sample in the period test).

    The MCCEA runs the \({T}^{2}\) control chart; if it detects an abnormal process state, it sends message to the DANNA. The detection reliability obtained in this work is the same that been obtained by Sylvain (2007). Figure 3 shows the detection reliability of MCCEA, and some faults are easily detectable [IDV (1), IDV (2), IDV (4), IDV (5), IDV (6), IDV (7), IDV (8), IDV (10), IDV (12), IDV (14)]. But other faults are difficult to detect [(IDV (3), IDV (9) and IDV (15)]. The last faults [(IDV (3), IDV (9), IDV (15)] are very identical. So the use of one chart (in this work, we use the \({T}^{2}\) control chart) is not sufficient. The run of many control charts simultaneously will augment the reliability of detection.

  • The Diagnosis

    This task is realized by the DANNA. When it receives message from the MCCEA that the process is not stable, it creates the neural network using MLP, for the purpose to find the fault that appeared in the process. In the next paragraph, we will show the diagnosis obtained by the DANNA and we will evaluate the acquired results to the result of other classifiers proposed in the literature.

    Diagnosis of the known faults in the Tennessee Eastman problem

    We have done the diagnosis of all the faults, i.e. IDV (1) to IDV (15) in TEP, as shown in Fig. 4. The used neural network is a MLP of three layers:

  • The input layer contains 53 neurons that represent the process parameters,

  • The hidden layer contains 34 neurons (number of neurons in input layer + number neurons in output layer/2),

  • The output layer contains 15 neurons that represent the process faults.

    Table 4 represents a comparison between the diagnosis realized by DANNA, and some other approaches proposed to the TEP faults diagnosis. Sylvain (2007) used Bayesian network for classification; however, the PC1DARMF (Li and Xiao 2011) is a supervised pattern classification method which uses one-dimensional adaptive rank-order morphological filter.

    Diagnosis of IDV (4), IDV(9), IDV(11) in TEP

    The most difficult faults to be classified in the TEP are: IDV (4), IDV (9) and IDV (15). The created neural network composed by 53 neurons (TEP parameters) in the input layer, 28 neurons in the hidden layer and 3 neurons in the output layer. Table 5 presents the rate of correct classification of the faults IDV (4), IDV (9) and IDV (15) of the TEP. It is a comparison between the DANNA diagnosis and the approach which proposed by El-Ferchichi (2013).

  • The Identification

    The IBNA is the responsible on the realization of the identification task using Bayesian net. It receives a report about the fault that appeared in the process from DANNA. To develop the Bayesian network, Sylvain (2007) used the causal decomposition of \({T}^{2}\). Figure 4 presents the Bayesian network that is created in the normal functionality of process. We take rate of false alarm \(= 0.005\). The IBNA takes the observation that represents the fault, and then, it finds the variables that are involved in the fault. The variables involved in the fault have probability value under 0.995. We take the case of the observation 240 of IDV (5) that is classified as an IDV (4). The IBNA detects two variables that have a probability value under 0.995. The two variables are (XMV11) and (XMEAS21). The IBNA sends the variable identification to the RA.

  • The Reconfiguration

    The RA receives report from IBNA which contains the identification of the variables that cause the process instability. In our example, the identified variables are (XMV11) and (XMEAS21). The RA finds that: the variable (XMV11) represents the liquid cooling flow to the condenser, whereas the variable (XMEAS21) represents the cooling liquid temperature at the reactor outlet. In conclusion, these two variables involved in the fault IDV (5), so the fault that appeared in the process is the IDV (5) and not the IDV (4). It proposes the reconfiguration plan to the operator. The development of this agent requires knowledge of an expert human, which we will use to find the ideal reconfiguration plan.

Fig. 3
figure 3

The detection reliability

Table 4 Classification rate of the known 15 faults in TEP
Table 5 Classification rate of IDV (4), IDV (9) and IDV (15) in TEP
Fig. 4
figure 4

The Bayesian network that used in the development of IBNA

Conclusion

An approach with several intelligences has been proposed in this paper for multivariate process monitoring. In this approach, we use the perfect tool for the realization of each task in a complex process monitoring. We use the multivariate control charts in the detection task. We utilize the artificial neural network classifier with MLP algorithm in the diagnosis task. For the identification task, we exploit the Bayesian network that has been proposed by Sylvain (2007). Moreover, to help the operators that are not specializing in realm, to realize the correction actions of the process, we suggest developing an expert system for reconfiguration task. To facilitate the use of the proposed approach with high efficiency, we integrate the different proposed subsystem (detection, diagnosis, identification and reconfiguration) in one system that is multi-agent system. The proposed model has been evaluated on a multivariate process (Tennessee Eastman process).

From the simulation results, we find that the proposed classifier gives a good result compared with some works applied on Tennessee Eastman process. In addition, the proposed approach gives good results for each task in the process monitoring. In the case study, we have seen that some faults are difficult for detecting; our future works will concentrate on the development of the detection task. The developed reconfiguration agent realizes the reconfiguration tasks for known faults, and we will focus also on adding the reconfiguration plan in case when a new fault appear in the process.