Process monitoring for quality–a feature selection method for highly unbalanced binary data

Escobar Diaz, Carlos A.; Arinez, Jorge; Macías Arregoyta, Daniela; Morales-Menendez, Ruben

doi:10.1007/s12008-021-00817-6

Process monitoring for quality–a feature selection method for highly unbalanced binary data

Original Paper
Published: 17 February 2022

Volume 16, pages 557–572, (2022)
Cite this article

International Journal on Interactive Design and Manufacturing (IJIDeM) Aims and scope Submit manuscript

300 Accesses
3 Citations
Explore all metrics

A Correction to this article was published on 07 April 2022

This article has been updated

Abstract

Defect detection is among the most critical modern challenges in manufacturing science. The state-of-the-art in this field involves generating only a few defects per million opportunities. Process Monitoring for Quality is a philosophy driven by big data and aimed at rare event detection applied to quality control. This is accomplished using binary classification and empirical knowledge discovery through feature interpretation, which is facilitated by feature selection methods. These analytical tools help identify the driving features of a system, which are then used in a manufacturing context to plan and design randomized experiments to determine optimal process parameters. This work presents a new filter-type feature selection method based on the separation between classes. As shown in previous studies, predictive ability is strongly correlated to the distribution of margins. Because manufacturing-derived data sets for binary quality classification tend to be highly or ultra-unbalanced, the proposed method is designed to analyze these data structures effectively. The method properties and ability to select high-quality features are illustrated through three case studies. First, virtual features are used to demonstrate the method procedures. Then the method is used to analyze a manufacturing-derived data set, from which the most relevant feature is identified and used for process redesign. Finally, five of the most widely used methods are compared with the proposed method using an analysis of publicly available data sets. The empirical results demonstrate a significantly improved prediction ability of the features selected by the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A failure mode and effect analysis (FMEA)-based approach for risk assessment of scientific processes in non-regulated research laboratories

Article Open access 11 August 2020

A review of unsupervised feature selection methods

Article 29 January 2019

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Change history

07 April 2022
A Correction to this paper has been published: https://doi.org/10.1007/s12008-022-00871-8

Notes

This method has been applied to many publicly available data sets as well as to many manufacturing-derived private GM data sets.
No probability distributions were used to generate those points.
Although all methods were applied with heuristically set hyperparameters, the authors acknowledge the limitations of these results given that a comprehensive comparison using multiple hyperparameter values is unfeasible.

References

Abell, J. A., Chakraborty, D., Escobar, C. A., Im, K. H., Wegner, D. M., and Wincek, M. A.: Big data-driven manufacturing—process-monitoring-for-quality philosophy. Journal of Manufacturing Science and Engineering, 139(10), (2017)
Escobar, C.A., Abell, J.A., Hernández-de-Menéndez, M., Morales-Menendez, R.: Process-monitoring-for-quality - big models. Procedia Manuf. 26, 1167–1179 (2018)
Article Google Scholar
Ribeiro, M. T., Singh, S., and Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In Proc of the 22nd ACM International Conference on Knowledge Discovery and Data Mining, (2016), pp. 1135–1144
Gunning, D.: Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency. www.darpa.mil/attachments/XAIProgramUpdate.pdf, online
Yu, L., and Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In ICML, vol. 3, p. 856–863 (2003)
Hall, M.: Correlation-based feature selection of discrete and numeric class machine learning. In Proc of the 17th International Conference on Machine Learning. University of Waikato, pp. 359–366 (2000)
Nicodemus, K.K., Malley, J.D.: Predictor correlation impacts machine learning algorithms: implications for genomic studies. Bioinformatics 25(15), 1884–1890 (2009)
Article Google Scholar
Fernandez, A., Garcia, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced data sets. Springer, Berlin (2018)
Book Google Scholar
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Schapire, R., Freund, Y., Bartlett, P., Lee, W., et al.: Boosting the Margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26(5), 1651–1686 (1998)
MathSciNet MATH Google Scholar
Murphy, K.: Machine learning: a probabilistic perspective. MIT Press, Cambridge (2012)
MATH Google Scholar
Vapnik, V.: The nature of statistical learning theory, 2nd edn. Springer, New York (1995)
Book Google Scholar
Crammer, K., Gilad-Bachrach, R., Navot, A., and Tishby, N.: Margin analysis of the LVQ algorithm. in Advances in Neural Information Processing Systems, pp. 479–486 (2003)
Feng, W., Huang, W., Ren, J.: Class imbalance ensemble learning based on the margin theory. Appl. Sci. 8(5), 815 (2018)
Article Google Scholar
Thornton, C.: Separability is a learner’s best friend. In Perspectives in Neural Computing 4th Neural Computation and Psychology Workshop, London, 9–11 April 1997. Springer, pp. 40–46 (1998)
Zighed, D. A., Lallich, S., and Muhlenbach, F.: Separability index in supervised learning. In, Lecture Notes in Computer Science European Conference on Principles of Data Mining and Knowledge Discovery. Springer, pp. 475–487 (2002)
Escobar, C.A., Morales-Menendez, R.: Process-monitoring-for-quality - a model selection criterion for support vector machine. Procedia Manufacturing 34, 1010–1017 (2019)
Article Google Scholar
Yijing, L., Haixiang, G., Xiao, L., Yanan, L., Jinling, L.: Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multiclass imbalanced data. Knowl.-Based Syst. 94, 88–104 (2016)
Article Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Article Google Scholar
Ng, A.: On feature selection: learning with exponentially many irrelevant features as training examples. In Proc of the 15th International Conference on Machine Learning. MIT, Dept. of Electrical Eng. and Computer Science, pp. 404–412 (1998)
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.: Feature Extraction: Foundations and Applications. Springer-Verlag, Berlin (2006)
Book Google Scholar
Langley, P.; Selection of relevant features in machine learning. In AAAI Fall Symposium on Relevance, (1994)
De Silva, A. M., and Leong, P. H. W.: Feature selection. In SpringerBriefs in Applied Sciences and Technology. Springer, Berlin pp. 13–24 (2015)
Ding, C., PengCh Ding, H., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(2), 185–205 (2005)
Article Google Scholar
Wang, W., Zuo, W.: Neighborhood component feature selection for high-dimensional data. J. Comput. 7(1), 161–168 (2012)
Google Scholar
Kira, K., Rendell, L.: The feature selection problem: traditional methods and a new algorithm. AAAI 2, 129–134 (1992)
Google Scholar
Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)
Article Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In Lecture Notes in Computer Science European conference on Machine Learning. Springer pp. 171–182 (1994)
Hall, M. A.: Correlation-based feature selection for machine learning. Tech. Rep., (1999)
WEKA. www.cs.waikato.ac.nz/ml/weka.
Frank, E., Hall, M. A., and Witten, I. H.: The weka workbench. Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 4th ed., vol. 2016 (2016)
Bahassine, S., Madani, A., Al-Sarem, M., Kissi, M.: Feature selection using an improved chi-square for arabic text classification. J. King Saud Univ.-Comput. Inf. Sci. 32(2), 225–231 (2020)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Devore, J.: Probability and statistics for engineering and the sciences. Cengage Learning, (2015)
Escobar, C.A., Morales-Menendez, R.: Machine learning techniques for quality control in high conformance manufacturing environment. Adv. Mech. Eng. 10(2), 1–16 (2018)
Article Google Scholar
Mi, Y.: Imbalanced classification based on active learning smote. Res. J. Appl. Sci. Eng. Technol. 5(3), 944–949 (2013)
Article Google Scholar
Goodman, S.: A dirty dozen: twelve p-value misconceptions. In Seminars in Hematology. Elsevier, vol. 45, no. 3, pp. 135-140 (2008)
Escobar, C.A., Morales-Menendez, R.: Process-monitoring-for-quality - a robust model selection criterion for the logistic regression algorithm. Manuf. Lett. 22, 6–10 (2019)
Article Google Scholar
Escobar, C. A., Wegner, D. M., Gaur, A., and Morales-Menendez, R.: Process-monitoring-for-quality - a model selection criterion for genetic programming. In Lecture Notes in Computer Science International Conference on Evolutionary Multi-Criterion Optimization. Springer, pp. 151–164 (2019)
Escobar, C.A., Morales-Menendez, R.: Process-monitoring-for-quality - a model selection criterion for L1-regularized logistic regression. Procedia Manuf. 34, 832–839 (2019)
Article Google Scholar
Huan, L., and Motoda, H.: Feature extraction, construction and selection: a data mining perspective (1998)
Chuang, L. Y., Ke, C. H., and Yang, C. H.: A hybrid both filter and wrapper feature selection method for microarray classification. arXiv Preprint arXiv:1612.08669 (2016)
Zhao, Y., Liu, Y., and Huang, W.: Prediction model of HBV reactivation in primary liver cancer—based on NCA feature selection and SVM classifier with bayesian and grid optimization. 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA). IEEE, 2018, pp. 547–551 (2018)
Sumaiya Thaseen, I., Aswani Kumar, C.: Intrusion detection model using fusion of chi-square feature selection and multi class SVM. J. King Saud Univ. – Comput. Inf. Sci. 29(4), 462–472 (2017)
Article Google Scholar
Reddy, T. R., Vardhan, B. V., GopiChand, M., and Karunakar, K.: Gender prediction in author profiling using relief feature selection algorithm. Intelligent Engineering Informatics. Springer, Singapore, pp. 169-176 (2018)
Arora, N., Kaur, P.D.: A bolasso based consistent feature selection enabled random forest classification algorithm: an application to credit risk assessment. Appl. Soft Comput. 86, 86 (2020)
Article Google Scholar
Amankwaa-Kyeremeh, B., Zhang, J., Zanin, M., Skinner, W., Asamoah, R.K.: Feature selection and gaussian process prediction of rougher copper recovery. Miner. Eng. 170, 170 (2021)
Article Google Scholar
Hirunyawanakul, A., Kaoungku, N., Kerdprasop, N., Kerdprasop, K.: Feature selection to improve performance of yield prediction in hard disk drive manufacturing. Int. J. Electr. Electron. Eng. Telecommun. 9, 420–428 (2020)
Google Scholar
Sheela, K.G., Deepa, S.N.: Review on methods to fix number of hidden neurons in neural networks. Math. Probl. Eng. 2013, 1–11 (2013)
Article Google Scholar
Heaton, J.: Introduction to neural networks with java. Heaton Research, Inc. (2008)
Demuth, H. B., Beale, M. H., De Jess, O., and Hagan, M. T.: Neural network design. Martin Hagan, (2014)
Møller, M.F.: A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw. 6(4), 525–533 (1993)
Article Google Scholar
MATLAB, Feature Selection Using Neighborhood Component Analysis for Classification. https://www.mathworks.com/help/stats/fscnca.html. Introduced in R (2021b)

Download references

Acknowledgements

The authors would like to acknowledge the technical and financial support of Writing Lab, TecLabs, Tecnológico de Monterrey, México, in the production of this work. Also, special thanks to Dr. Jeffrey A. Abell for supporting the application of PMQ across General Motors.

Author information

Authors and Affiliations

General Motors Global Research and Development, 30,500 Mound Road, Warren, MI, 48092, USA
Carlos A. Escobar Diaz & Jorge Arinez
Escuela de Ingeniería Y Ciencias, Tecnológico de Monterrey, Av. E Garza Sada 2501, Col. Tecnológico, 64849, Monterrey, NL, México
Daniela Macías Arregoyta & Ruben Morales-Menendez

Authors

Carlos A. Escobar Diaz
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Arinez
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Macías Arregoyta
View author publications
You can also search for this author in PubMed Google Scholar
Ruben Morales-Menendez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos A. Escobar Diaz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: Corrected version of abstract updated.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Escobar Diaz, C.A., Arinez, J., Macías Arregoyta, D. et al. Process monitoring for quality–a feature selection method for highly unbalanced binary data. Int J Interact Des Manuf 16, 557–572 (2022). https://doi.org/10.1007/s12008-021-00817-6

Download citation

Received: 03 December 2020
Accepted: 06 December 2021
Published: 17 February 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s12008-021-00817-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Process monitoring for quality–a feature selection method for highly unbalanced binary data

Abstract

Access this article

Similar content being viewed by others

A failure mode and effect analysis (FMEA)-based approach for risk assessment of scientific processes in non-regulated research laboratories

A review of unsupervised feature selection methods

Feature selection techniques for machine learning: a survey of more than two decades of research

Change history

07 April 2022

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Process monitoring for quality–a feature selection method for highly unbalanced binary data

Abstract

Access this article

Similar content being viewed by others

A failure mode and effect analysis (FMEA)-based approach for risk assessment of scientific processes in non-regulated research laboratories

A review of unsupervised feature selection methods

Feature selection techniques for machine learning: a survey of more than two decades of research

Change history

07 April 2022

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation