Abstract
In fault detection systems, a massive amount of data gathered from the life-cycle of equipment is often used to learn models or classifiers that aims at diagnosing different kinds of errors or failures. Among this huge quantity of information, some features (or sets of features) are more correlated with a kind of failure than another. The presence of irrelevant features might affect the performance of the classifier. To improve the performance of a detection system, feature selection is hence a key step. We propose in this paper an algorithm named STRASS, which aims at detecting relevant features for classification purposes. In certain cases, when there exists a strong correlation between some features and the associated class, conventional feature selection algorithms fail at selecting the most relevant features. In order to cope with this problem, STRASS algorithm uses k-way correlation between features and the class to select relevant features. To assess the performance of STRASS, we apply it on simulated data collected from the Tennessee Eastman chemical plant simulator. The Tennessee Eastman process (TEP) has been used in many fault detection studies and three specific faults are not well discriminated with conventional algorithms. The results obtained by STRASS are compared to those obtained with reference feature selection algorithms. We show that the features selected by STRASS always improve the performance of a classifier compared to the whole set of original features and that the obtained classification is better than with most of the other feature selection algorithms.
Similar content being viewed by others
References
Agrawal R, Ghosh S, Imielinski T, Iyer B, Swami A (1992) An interval classifier for database mining applications. In: Proceedings of the 18th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB 92,560–573
Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: Proceedings of the Ninth National Conference on Artificial Intelligence, AAAI Press, pp 547–552
Almuallim H, Dietterich TG (1994) Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69:279–305
Bache K, Lichman M (2013) UCI machine learning repository
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271
Casillas J, Cordn O, Jesus MJD, Herrera F, Casillas J, Herrera F (2000) Genetic feature selection in a fuzzy rule-based classification system learning process for high dimensional problems
Casimir R, Boutleux E, Clerc G, Yahoui A (2006) The use of features selection and nearest neighbors rule for faults diagnostic in induction motors. Eng Appl Artif Intell 19(2):169–177
Chebel Morello B, Michaut D, Baptiste P (2001) A knowledge discovery process for a flexible manufacturing system. In: Emerging Technologies and Factory Automation, 2001. Proceedings. 2001 8th IEEE International Conference on, pp 651–658 vol. 1
Chiang LH, Kotanchek ME, Kordon AK (2004) Fault diagnosis based on fisher discriminant analysis and support vector machines. Comput Chem Eng 28(8):1389–1401
Cui P, Li J, Wang G (2008) Improved kernel principal component analysis for fault detection. Expert Syst Appl 34(2):1210– 1219
Dash M, Liu H, Motoda H (2000) Consistency based feature selection. In: Terano T, Liu H, Chen A (eds) Knowledge Discovery and Data Mining, Current Issues and New Applications Lecture Notes in Computer Science, vol 1805. Springer, Berlin, pp 98– 109
Downs J, Vogel E (1993) A plant-wide industrial process control problem. Comput Chem Eng 17(3):245–255
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software An update. SIGKDD Explor Newsl 11(1):10–18
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. Morgan Kaufmann, pp 359–366
Jack L, Nandi A (2000) Genetic algorithms for feature selection in machine condition monitoring with vibration signals. Vision, Image and Signal Process IEE Proc 147(3):205–212
Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI Press, AAAI’92, pp 129–134
Kononenko I (1994) Estimating attributes: analysis and extensions of relief. Springer Verlag 171–182
Kononenko I, Simec E, Robnik-Sikonja M (1997) Overcoming the myopia of inductive learning algorithms with relieff, vol 7, pp 39–55
Langley P, Sage S (1997) Computational learning theory and natural learning systems: Volume iv. MIT Press, Cambridge MA, USA. chap Scaling to Domains with Irrelevant Features, pp 51–63
Lanzi PL (1997) Fast feature selection with genetic algorithms: a filter approach. In: Evolutionary Computation, 1997.,IEEE International Conference on, pp 537–540
Liu H, Motoda H (1998) Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell MA, USA
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. Knowl Data Eng, IEEE Trans on 17(4):491–502
Marcotorchino F (1984) Utilisation des comparaisons par paires en statistique des contingences. Centre scientifique IBM Paris Etudes F-069, F-071, F-081
Michaut D (1999) Filterign and variable selection in learning processes. PhD Univ of Franche Comt
Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 26(9):917–922
Noruzi Nashalji M, Aliyari Shoorehdeli M, Teshnehlab M (2010). In: Gao XZ, Gaspar-Cunha A, Kppen M, Schaefer G, Wang J (eds) Fault detection of the tennessee eastman process using improved pca and neural classifier
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Ricker NL (1996) Decentralized control of the tennessee eastman challenge process. J Process Control 6 (4):205–221
Riverol C, Carosi C (2008) Integration of fault diagnosis based on case-based reasoning principles in brewing. Sens & Instrumen Food Qual 2(1):15–20
Senoussi H, Chebel-Morello B (2008) A new contextual based feature selection. In: Neural Networks, 2008. IJCNN 2008 (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, pp 1265–1272
Sugumaran V, Muralidharan V, Ramachandran K (2007) Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing, vol 21, pp 930–942
Thrun S, Bala J, Bloedorn E, Bratko I, Cestnik B, Cheng J, Jong KD, Dzeroski S, Hamann R, Kaufman K, Keller S, Kononenko I, Kreuziger J, Michalski R, Mitchell T, Pachowicz P, Roger B, Vafaie H, de Velde WV, Wenzel W, Wnek J, Zhang J (1991) The MONK’s problems: A performance comparison of different learning algorithms. Tech. Rep. CMU-CS-91-197, Carnegie. Mellon University, Computer Science Department, Pittsburgh, PA
Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
Torkkola K, Venkatesan S, Liu H (2004) Sensor selection for maneuver classification. In: Proceedings of the 7th IEEE International ITSC Conference
Tyan CY, Wang PP, Bahler DR (1996) An application on intelligent control using neural network and fuzzy logic. Neurocomputing 12(4):345–363
Verron S, Tiplica T, Kobi A (2008) Fault detection and identification with a new feature selection based on mutual information. J Process Control 18(5):479–490
Wang L, Yu J (2005) Fault feature selection based on modified binary pso with mutation and its application in chemical process fault diagnosis. In: Wang L, Chen K, Ong Y (eds) Advances in Natural Computation, Lecture Notes in Computer Science, vol 3612, Springer Berlin Heidelberg, pp 832–840
Widodo A, Yang BS (2007) Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors. Expert Syst Appl 33(1):241–250
Yang BS, Widodo A (2008) Support Vector Machine for Machine Fault Diagnosis and Prognosis. J Syst Des Dynamics 2:12– 23
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: The Tennessee Eastman Process
The Tennessee Eastman Process (TEP) is a chemical process, created by the Eastman Chemical Company to provide a realistic industrial process in order to evaluate process control and monitoring methods [12]. This process was simulated on Matlab by Ricker [29]. The simulator was used to generate overlapping data sets to evaluate the classification performance. Figure 5 shows a flow sheet of TEP. There are four unit operations: an exothermic two-phase reactor, a flash separator, a re-boiler striper, and a recycle compressor. The TEP process produces two products (G and H) and one (undesired) by-product F from four reactants (A, C, D and E). This process has 12 input variables and 41 output variables. Only 52 variables are taken into account in this problem because one of the input variables (the reactor agitator speed) is constant. The system has fifteen types of identified faults. In this paper, we considered only three types of fault : fault 4, 9 and 11. These faults are described in Table 8.
Appendix B: Synthetic data
We describe in this appendix the synthetic data used in this paper for simulation purposes.The LED display domain data set is available on the UCI data set repository [4].
The MONK’s problems [33] are composed of three target concepts:
MONK-1 : (x 1=x 2)∨(x 3=1)
MONK-2 : exactly two of :
MONK-3 : (x 5=3 ∧ x 4=1)∨(x 5≠4 ∧ x 2≠3)
The BOOL data set is composed of a function of six Boolean features giving a Boolean class, for instance : y c l a s s =(x 1⊕x 2)∨(x 3∧x 4)∨(x 5∧x 6). Six other randomly generated Boolean features are added to these features.
The Parity data set is composed of a function of three Boolean features y c l a s s =x 1⊕x 2⊕x 3. Seven randomly generated Boolean features are added. This data set is particularly interesting because no relevant features taken separately can be distinguished from irrelevant ones.
The Parity2 data set is the same as the Parity data set to which 2 redundant features are added : x 11=x 1 and x 12=x 2. This data set allows to test the algorithms ability to work with redundant features.
The Coral data set is composed of six binary features x 1 to x 6 among which x 5 is irrelevant and x 6 is correlated to 75 % with the feature class y c l a s s =(x 1∧x 2)∨(x 3∧x 4).
Agrawal’s functions are a series of classification functions of increasing complexity that uses nine features to classify people into different groups. More details can be found in [1].
Rights and permissions
About this article
Cite this article
Chebel-Morello, B., Malinowski, S. & Senoussi, H. Feature selection for fault detection systems: application to the Tennessee Eastman process. Appl Intell 44, 111–122 (2016). https://doi.org/10.1007/s10489-015-0694-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-015-0694-6