Skip to main content
Log in

Feature selection for fault detection systems: application to the Tennessee Eastman process

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In fault detection systems, a massive amount of data gathered from the life-cycle of equipment is often used to learn models or classifiers that aims at diagnosing different kinds of errors or failures. Among this huge quantity of information, some features (or sets of features) are more correlated with a kind of failure than another. The presence of irrelevant features might affect the performance of the classifier. To improve the performance of a detection system, feature selection is hence a key step. We propose in this paper an algorithm named STRASS, which aims at detecting relevant features for classification purposes. In certain cases, when there exists a strong correlation between some features and the associated class, conventional feature selection algorithms fail at selecting the most relevant features. In order to cope with this problem, STRASS algorithm uses k-way correlation between features and the class to select relevant features. To assess the performance of STRASS, we apply it on simulated data collected from the Tennessee Eastman chemical plant simulator. The Tennessee Eastman process (TEP) has been used in many fault detection studies and three specific faults are not well discriminated with conventional algorithms. The results obtained by STRASS are compared to those obtained with reference feature selection algorithms. We show that the features selected by STRASS always improve the performance of a classifier compared to the whole set of original features and that the obtained classification is better than with most of the other feature selection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://penglab.janelia.org/software/

References

  1. Agrawal R, Ghosh S, Imielinski T, Iyer B, Swami A (1992) An interval classifier for database mining applications. In: Proceedings of the 18th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB 92,560–573

  2. Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: Proceedings of the Ninth National Conference on Artificial Intelligence, AAAI Press, pp 547–552

  3. Almuallim H, Dietterich TG (1994) Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69:279–305

    Article  MATH  MathSciNet  Google Scholar 

  4. Bache K, Lichman M (2013) UCI machine learning repository

  5. Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271

    Article  MATH  MathSciNet  Google Scholar 

  6. Casillas J, Cordn O, Jesus MJD, Herrera F, Casillas J, Herrera F (2000) Genetic feature selection in a fuzzy rule-based classification system learning process for high dimensional problems

  7. Casimir R, Boutleux E, Clerc G, Yahoui A (2006) The use of features selection and nearest neighbors rule for faults diagnostic in induction motors. Eng Appl Artif Intell 19(2):169–177

    Article  Google Scholar 

  8. Chebel Morello B, Michaut D, Baptiste P (2001) A knowledge discovery process for a flexible manufacturing system. In: Emerging Technologies and Factory Automation, 2001. Proceedings. 2001 8th IEEE International Conference on, pp 651–658 vol. 1

  9. Chiang LH, Kotanchek ME, Kordon AK (2004) Fault diagnosis based on fisher discriminant analysis and support vector machines. Comput Chem Eng 28(8):1389–1401

    Article  Google Scholar 

  10. Cui P, Li J, Wang G (2008) Improved kernel principal component analysis for fault detection. Expert Syst Appl 34(2):1210– 1219

    Article  Google Scholar 

  11. Dash M, Liu H, Motoda H (2000) Consistency based feature selection. In: Terano T, Liu H, Chen A (eds) Knowledge Discovery and Data Mining, Current Issues and New Applications Lecture Notes in Computer Science, vol 1805. Springer, Berlin, pp 98– 109

  12. Downs J, Vogel E (1993) A plant-wide industrial process control problem. Comput Chem Eng 17(3):245–255

    Article  Google Scholar 

  13. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422

    Article  MATH  Google Scholar 

  14. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software An update. SIGKDD Explor Newsl 11(1):10–18

  15. Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. Morgan Kaufmann, pp 359–366

  16. Jack L, Nandi A (2000) Genetic algorithms for feature selection in machine condition monitoring with vibration signals. Vision, Image and Signal Process IEE Proc 147(3):205–212

    Article  Google Scholar 

  17. Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI Press, AAAI’92, pp 129–134

  18. Kononenko I (1994) Estimating attributes: analysis and extensions of relief. Springer Verlag 171–182

  19. Kononenko I, Simec E, Robnik-Sikonja M (1997) Overcoming the myopia of inductive learning algorithms with relieff, vol 7, pp 39–55

  20. Langley P, Sage S (1997) Computational learning theory and natural learning systems: Volume iv. MIT Press, Cambridge MA, USA. chap Scaling to Domains with Irrelevant Features, pp 51–63

    Google Scholar 

  21. Lanzi PL (1997) Fast feature selection with genetic algorithms: a filter approach. In: Evolutionary Computation, 1997.,IEEE International Conference on, pp 537–540

  22. Liu H, Motoda H (1998) Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell MA, USA

    Book  MATH  Google Scholar 

  23. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. Knowl Data Eng, IEEE Trans on 17(4):491–502

    Article  Google Scholar 

  24. Marcotorchino F (1984) Utilisation des comparaisons par paires en statistique des contingences. Centre scientifique IBM Paris Etudes F-069, F-071, F-081

  25. Michaut D (1999) Filterign and variable selection in learning processes. PhD Univ of Franche Comt

  26. Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 26(9):917–922

    Article  MATH  Google Scholar 

  27. Noruzi Nashalji M, Aliyari Shoorehdeli M, Teshnehlab M (2010). In: Gao XZ, Gaspar-Cunha A, Kppen M, Schaefer G, Wang J (eds) Fault detection of the tennessee eastman process using improved pca and neural classifier

  28. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238

    Article  Google Scholar 

  29. Ricker NL (1996) Decentralized control of the tennessee eastman challenge process. J Process Control 6 (4):205–221

    Article  MathSciNet  Google Scholar 

  30. Riverol C, Carosi C (2008) Integration of fault diagnosis based on case-based reasoning principles in brewing. Sens & Instrumen Food Qual 2(1):15–20

    Article  Google Scholar 

  31. Senoussi H, Chebel-Morello B (2008) A new contextual based feature selection. In: Neural Networks, 2008. IJCNN 2008 (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, pp 1265–1272

  32. Sugumaran V, Muralidharan V, Ramachandran K (2007) Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing, vol 21, pp 930–942

  33. Thrun S, Bala J, Bloedorn E, Bratko I, Cestnik B, Cheng J, Jong KD, Dzeroski S, Hamann R, Kaufman K, Keller S, Kononenko I, Kreuziger J, Michalski R, Mitchell T, Pachowicz P, Roger B, Vafaie H, de Velde WV, Wenzel W, Wnek J, Zhang J (1991) The MONK’s problems: A performance comparison of different learning algorithms. Tech. Rep. CMU-CS-91-197, Carnegie. Mellon University, Computer Science Department, Pittsburgh, PA

    Google Scholar 

  34. Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  Google Scholar 

  35. Torkkola K, Venkatesan S, Liu H (2004) Sensor selection for maneuver classification. In: Proceedings of the 7th IEEE International ITSC Conference

  36. Tyan CY, Wang PP, Bahler DR (1996) An application on intelligent control using neural network and fuzzy logic. Neurocomputing 12(4):345–363

    Article  Google Scholar 

  37. Verron S, Tiplica T, Kobi A (2008) Fault detection and identification with a new feature selection based on mutual information. J Process Control 18(5):479–490

    Article  Google Scholar 

  38. Wang L, Yu J (2005) Fault feature selection based on modified binary pso with mutation and its application in chemical process fault diagnosis. In: Wang L, Chen K, Ong Y (eds) Advances in Natural Computation, Lecture Notes in Computer Science, vol 3612, Springer Berlin Heidelberg, pp 832–840

  39. Widodo A, Yang BS (2007) Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors. Expert Syst Appl 33(1):241–250

    Article  Google Scholar 

  40. Yang BS, Widodo A (2008) Support Vector Machine for Machine Fault Diagnosis and Prognosis. J Syst Des Dynamics 2:12– 23

    Article  MATH  Google Scholar 

  41. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MATH  Google Scholar 

  42. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Malinowski.

Appendices

Appendix A: The Tennessee Eastman Process

The Tennessee Eastman Process (TEP) is a chemical process, created by the Eastman Chemical Company to provide a realistic industrial process in order to evaluate process control and monitoring methods [12]. This process was simulated on Matlab by Ricker [29]. The simulator was used to generate overlapping data sets to evaluate the classification performance. Figure 5 shows a flow sheet of TEP. There are four unit operations: an exothermic two-phase reactor, a flash separator, a re-boiler striper, and a recycle compressor. The TEP process produces two products (G and H) and one (undesired) by-product F from four reactants (A, C, D and E). This process has 12 input variables and 41 output variables. Only 52 variables are taken into account in this problem because one of the input variables (the reactor agitator speed) is constant. The system has fifteen types of identified faults. In this paper, we considered only three types of fault : fault 4, 9 and 11. These faults are described in Table 8.

Fig. 5
figure 5

Process flow sheet of TEP

Table 8 Description of the faults used in this paper

Appendix B: Synthetic data

We describe in this appendix the synthetic data used in this paper for simulation purposes.The LED display domain data set is available on the UCI data set repository [4].

The MONK’s problems [33] are composed of three target concepts:

MONK-1 : (x 1=x 2)∨(x 3=1)

MONK-2 : exactly two of :

$$\{x_{1} = 1,x_{2} = 1,x_{3} = 1,x_{4} = 1,x_{5} = 1,x_{6} = 1\}$$

MONK-3 : (x 5=3 ∧ x 4=1)∨(x 5≠4 ∧ x 2≠3)

The BOOL data set is composed of a function of six Boolean features giving a Boolean class, for instance : y c l a s s =(x 1x 2)∨(x 3x 4)∨(x 5x 6). Six other randomly generated Boolean features are added to these features.

The Parity data set is composed of a function of three Boolean features y c l a s s =x 1x 2x 3. Seven randomly generated Boolean features are added. This data set is particularly interesting because no relevant features taken separately can be distinguished from irrelevant ones.

The Parity2 data set is the same as the Parity data set to which 2 redundant features are added : x 11=x 1 and x 12=x 2. This data set allows to test the algorithms ability to work with redundant features.

The Coral data set is composed of six binary features x 1 to x 6 among which x 5 is irrelevant and x 6 is correlated to 75 % with the feature class y c l a s s =(x 1x 2)∨(x 3x 4).

Agrawal’s functions are a series of classification functions of increasing complexity that uses nine features to classify people into different groups. More details can be found in [1].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chebel-Morello, B., Malinowski, S. & Senoussi, H. Feature selection for fault detection systems: application to the Tennessee Eastman process. Appl Intell 44, 111–122 (2016). https://doi.org/10.1007/s10489-015-0694-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0694-6

Keywords

Navigation