Towards big industrial data mining through explainable automated machine learning

Garouani, Moncef; Ahmad, Adeel; Bouneffa, Mourad; Hamlich, Mohamed; Bourguin, Gregory; Lewandowski, Arnaud

doi:10.1007/s00170-022-08761-9

Towards big industrial data mining through explainable automated machine learning

ORIGINAL ARTICLE
Published: 10 February 2022

Volume 120, pages 1169–1188, (2022)
Cite this article

The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

Moncef Garouani^1,2,3,
Adeel Ahmad¹,
Mourad Bouneffa¹,
Mohamed Hamlich²,
Gregory Bourguin¹ &
…
Arnaud Lewandowski¹

1355 Accesses
22 Citations
Explore all metrics

Abstract

Industrial systems resources are capable of producing large amount of data. These data are often in heterogeneous formats and distributed, yet they provide means to mine the information which can allow the deployment of intelligent management tools for production activities. For this purpose, it is necessary to be able to implement knowledge extraction and prediction processes using Artificial Intelligence (AI) models, but the selection and configuration of intended AI models tend to be increasingly complex for a non-expert user. In this paper, we present an approach and a software platform that may allow industrial actors, who are usually not familiar with AI, to select and configure algorithms optimally adapted to their needs. Hence, the approach is essentially based on automated machine learning. The resulting platform effectively enables a better choice among the combination of AI algorithms and hyper-parameters configurations. It also makes it possible to provide features of explainability of the resulting algorithms and models, thus increasing the acceptability of these models in practicing community of the users. The proposed approach has been applied in the field of predictive maintenance. Current tests are based on the analysis of more than 360 databases from the subjected field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning: Algorithms, Real-World Applications and Research Directions

Article 22 March 2021

AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems

Article Open access 10 February 2022

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

Article Open access 07 December 2023

Availability of data and materials

All data generated or analyzed during this study are included in this paper.

Code availability

Software code is included in the study github repository: https://github.com/LeMGarouani/AMLBID.

Notes

References

Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59. https://doi.org/10.1089/big.2013.1508
Article Google Scholar
Brynjolfsson E, Hitt LM, Kim HH (2011) Strength in numbers: how does data-driven decision making affect firm performance? SSRN scholarly paper ID 1819486. Social Science Research Network, Rochester, NY. https://doi.org/10.2139/ssrn.1819486
Samek W, Müller KR (2019) Towards explainable artificial intelligence. In: Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 5–22. https://doi.org/10.1007/978-3-030-28954-6
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Article Google Scholar
Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39–55. https://doi.org/10.1109/MM.2008.31
Article Google Scholar
Nural MV, Peng H, Miller JA (2017) Using meta-learning for model type selection in predictive big data analytics. In: 2017 IEEE International Conference on Big Data (Big Data). pp 2027–2036. https://doi.org/10.1109/BigData.2017.8258149
Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F (2015) Efficient and robust automated machine learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol 2. MIT Press, Cambridge, MA, USA, NIPS’15, pp 2755–2763
Olson RS, Moore JH (2019) TPOT: A tree-based pipeline optimization tool for automating machine learning. In: The Springer Series on Challenges in Machine Learning. Springer International Publishing, Cham, pp 151–160. https://doi.org/10.1007/978-3-030-05318-5_8
Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2019) Auto-WEKA: automatic model selection and hyperparameter optimization in WEKA. In: The Springer Series on Challenges in Machine Learning. Springer International Publishing, Cham, pp 81–95. https://doi.org/10.1007/978-3-030-05318-5_4
Swearingen T, Drevo W, Cyphers B, Cuesta-Infante A, Ross A, Veeramachaneni K (2017) ATM: A distributed, collaborative, scalable system for automated machine learning. In: 2017 IEEE International Conference on Big Data (Big Data), pp 151–162. https://doi.org/10.1109/BigData.2017.8257923
Cloud AutoML custom machine learning models. Google Cloud. https://cloud.google.com/automl (visited on 12/10/2021)
Xu F, Uszkoreit H, Du Y, Fan W, Zhao D, Zhu J (2019) Explainable AI: a brief survey on history, research areas, approaches and challenges. In: Tang J, Kan MY, Zhao D, Li S, Zan H (eds) Natural Language Processing and Chinese Computing. Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 563–574. https://doi.org/10.1007/978-3-030-32236-6_51
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery. New York, NY, USA, KDD ’16, pp 1135–1144. https://doi.org/10.1145/2939672.2939778
Lemke C, Budka M, Gabrys B (2015) Metalearning: a survey of trends and technologies. Artif Intell Rev 44(1):117–130. https://doi.org/10.1007/s10462-013-9406-y
Article Google Scholar
R. a. M. ltd. Big data market by component, deployment mode, organization size, business function (operations, finance, and marketing and sales), industry vertical (BFSI, manufacturing, and healthcare and life sciences), and region - global forecast to 2025
Cuartas M, Ruiz E, Ferreño D, Setién J, Arroyo V, Gutiérrez-Solana F (2020) Machine learning algorithms for the prediction of non-metallic inclusions in steel wires for tire reinforcement. J Intell Manuf. https://doi.org/10.1007/s10845-020-01623-9
Article Google Scholar
Medina R, Macancela JC, Lucero P, Cabrera D, Sánchez RV, Cerrada M (2020) Gear and bearing fault classification under different load and speed by using Poincaré plot features and SVM. J Intell Manuf. https://doi.org/10.1007/s10845-020-01712-9
Article Google Scholar
Jalali A, Heistracher C, Schindler A, Haslhofer B, Nemeth T, Glawar R, Sihn W, De Boer P (2019) Predicting time-to-failure of plasma etching equipment using machine learning. In: 2019 IEEE International Conference on Prognostics and Health Management (ICPHM). pp 1–8. https://doi.org/10.1109/ICPHM.2019.8819404
Wolpert D, Macready W (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82. https://doi.org/10.1109/4235.585893
Article Google Scholar
Reif M, Shafait F, Goldstein M, Breuel T, Dengel A (2014) Automatic classifier selection for non-experts. Pattern Anal Applic 17(1):83–96. https://doi.org/10.1007/s10044-012-0280-z
Article MathSciNet Google Scholar
Babichev SA, Ries J, Lvovsky AI (2002) Quantum scissors: teleportation of single-mode optical states by means of a nonlocal single photon. Preprint at https://arxiv.org/abs/quant-ph/0208066v1
Darwinai. https://darwinai.com/ (visited on 12/10/2021)
DataRobot. https://www.datarobot.com/ (visited on 12/10/2021)
Bilalli B, Abelló A, Aluja-Banet T, Wrembel R (2016) Automated data pre-processing via meta-learning. In: Bellatreche L, Pastor Ó, Almendros Jiménez JM, Aït-Ameur Y (eds) Model and Data Engineering, Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 194–208. https://doi.org/10.1007/978-3-319-45547-1_16
Bilalli B, Abelló A, Aluja-Banet T, Munir RF, Wrembel R (2018) Presistant: Data pre-processing assistant. In: Mendling J, Mouratidis H (eds) Information Systems in the Big Data Era, Springer International Publishing, Cham, Lecture Notes in Business Information Processing, pp 57–65. https://doi.org/10.1007/978-3-319-92901-9_6
Khurana U, Samulowitz H, Turaga D (2017) Feature engineering for predictive modeling using reinforcement learning. arXiv e-prints 1709:arXiv:1709.07150
Nargesian F, Samulowitz H, Khurana U, Khalil EB, Turaga D (2017) Learning feature engineering for classification. pp 2529–2535
Vainshtein R, Greenstein-Messica A, Katz G, Shapira B, Rokach L (2018) A hybrid approach for automatic model recommendation. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Association for Computing Machinery. New York, NY, USA, CIKM ’18, pp 1623–1626. https://doi.org/10.1145/3269206.3269299
Feurer M, Springenberg JT, Hutter F (2015) Initializing Bayesian hyperparameter optimization via meta-learning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI Press, Austin, Texas, AAAI’15, pp 1128–1135
Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-Sklearn: automatic hyperparameter configuration for Scikit-learn. Proceedings of the 13th Python in Science Conference pp 32–37. https://doi.org/10.25080/Majora-14bd3278-006
Jin H, Song Q, Hu X (2019) Auto-Keras: an efficient neural architecture search system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery. New York, NY, USA, KDD ’19, pp 1946–1956. https://doi.org/10.1145/3292500.3330648
Garouani M, Ahmad A, Bouneffa M, Lewandowski A, Bourguin G, Hamlich M (2021) Towards the automation of industrial data science: a meta-learning based approach. In: Proceedings of the 23rd International Conference on Enterprise Information Systems - vol. 1: ICEIS, INSTICC. SciTePress, pp 709–716. https://doi.org/10.5220/0010457107090716
Maher M, Sakr S (2019) SmartML: a meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. https://doi.org/10.5441/002/edbt.2019.54
Shin D, Park YJ (2019) Role of fairness, accountability, and transparency in algorithmic affordance. Comput Hum Behav 98:277–284. https://doi.org/10.1016/j.chb.2019.04.019
Article Google Scholar
Heath RL, Bryant J (2000) Human communication theory and research: concepts, contexts, and challenges, 2nd edn. Routledge, Mahwah, N.J.
Google Scholar
Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang GZ (2019) XAI—Explainable artificial intelligence. Science Robotics 4(37). https://doi.org/10.1126/scirobotics.aay7120
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: high-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence
Harley AW (2015) An interactive node-link visualization of convolutional neural networks. In: Bebis G, Boyle R, Parvin B, Koracin D, Pavlidis I, Feris R, McGraw T, Elendt M, Kopper R, Ragan E, Ye Z, Weber G (eds) Advances in Visual Computing, Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 867–877. https://doi.org/10.1007/978-3-319-27857-5_77
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI (2020) From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence 2(1):56–67. https://doi.org/10.1038/s42256-019-0138-9
Article Google Scholar
Montavon G, Lapuschkin S, Binder A, Samek W, Müller KR (2017) Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn 65:211–222. https://doi.org/10.1016/j.patcog.2016.11.008
Article Google Scholar
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision – ECCV 2014. Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 818–833. https://doi.org/10.1007/978-3-319-10590-1_53
Müller J, Stoehr M, Oeser A, Gaebel J, Streit M, Dietz A, Oeltze-Jafra S (2020) A visual approach to explainable computerized clinical decision support. Comput Graph 91:1–11. https://doi.org/10.1016/j.cag.2020.06.004
Article Google Scholar
Spinner T, Schlegel U, Schäfer H, El-Assady M (2020) explAIner: a visual analytics framework for interactive and explainable machine learning. IEEE Trans Vis Comput Graph 26(1):1064–1074. https://doi.org/10.1109/TVCG.2019.2934629
Article Google Scholar
Wang Q, Ming Y, Jin Z, Shen Q, Liu D, Smith MJ, Veeramachaneni K, Qu H (2019) ATMSeer: increasing transparency and controllability in automated machine learning. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery. New York, NY, USA, CHI ’19, pp 1–12. https://doi.org/10.1145/3290605.3300911
Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, NIPS’11, pp 2546–2554
Mazumder RK, Salman AM, Li Y (2021) Failure risk analysis of pipelines using data-driven machine learning algorithms. Struct Saf 89:102047. https://doi.org/10.1016/j.strusafe.2020.102047
Saravanamurugan S, Thiyagu S, Sakthivel NR, Nair B (2017) Chatter prediction in boring process using machine learning technique. Int J Manuf Res. https://doi.org/10.1504/IJMR.2017.10007082
Article Google Scholar
Benkedjouh T, Medjaher K, Zerhouni N, Rechak S (2013) Health assessment and life prediction of cutting tools based on support vector regression. J Intell Manuf. https://doi.org/10.1007/s10845-013-0774-6
Rouder JN, Engelhardt CR, McCabe S, Morey RD (2016) Model comparison in ANOVA. Psychon Bull Rev 23(6):1779–1786. https://doi.org/10.3758/s13423-016-1026-5
Article Google Scholar
Sauro J, Lewis J (2016) Standardized usability questionnaires, 2nd edn. Morgan Kaufmann, Boston, pp 185–248. https://doi.org/10.1016/B978-0-12-802308-2.00008-4
Milo T, Somech A (2020) Automating exploratory data analysis via machine learning: an overview. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, ACM. https://doi.org/10.1145/3318464.3383126
Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D (2016) MLlib: Machine learning in apache spark. J Mach Learn Res 17(1):1235–1241
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank the Université du Littoral Côte d’Opale (ULCO), France, School of engineering’s and business’ sciences and technics (HESTIM), Morocco and CNRST Morocco for the partial financial support, and all the participants involved in the system evaluation for their constructive discussions and valuable suggestions.

Author information

Authors and Affiliations

Univ. Littoral Cote d’Opale, UR 4491, LISIC, Laboratoire d’Informatique Signal et Image de la Cote d’Opale, F-62100, Calais, France
Moncef Garouani, Adeel Ahmad, Mourad Bouneffa, Gregory Bourguin & Arnaud Lewandowski
CCPS Laboratory, ENSAM, University of Hassan II, Casablanca, Morocco
Moncef Garouani & Mohamed Hamlich
Study and Research Center for Engineering and Management (CERIM), HESTIM, Casablanca, Morocco
Moncef Garouani

Authors

Moncef Garouani
View author publications
You can also search for this author in PubMed Google Scholar
Adeel Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Mourad Bouneffa
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Hamlich
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Bourguin
View author publications
You can also search for this author in PubMed Google Scholar
Arnaud Lewandowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moncef Garouani.

Ethics declarations

Ethics approval

All authors confirm that this article does not have any academic ethics issues and strictly follows the journal submission rules.

Consent to participate

All authors agree to participate in the research work of this paper and publish it in the International Journal of Advanced Manufacturing Technology.

Consent for publication

All authors agree to publish this article in the International Journal of Advanced Manufacturing Technology.

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Tables 11, 12, 13, 14, 15, 16, and 17 and Fig. 12

Table 11 SVM hyperparameters tuned in the experiments

Full size table

Table 12 Random Forest & Extra Trees Hyperparameters tuned in the experiments

Full size table

Table 13 Adaboost Hyperparameters tuned in the experiments

Full size table

Table 14 Decision Trees Hyperparameters tuned in the experiments

Full size table

Table 15 Logistic Regression Hyperparameters tuned in the experiments

Full size table

Table 16 SGD Classifier Hyperparameters tuned in the experiments

Full size table

Table 17 Gradient Boosting Hyperparameters tuned in the experiments

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garouani, M., Ahmad, A., Bouneffa, M. et al. Towards big industrial data mining through explainable automated machine learning. Int J Adv Manuf Technol 120, 1169–1188 (2022). https://doi.org/10.1007/s00170-022-08761-9

Download citation

Received: 28 July 2021
Accepted: 14 January 2022
Published: 10 February 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s00170-022-08761-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards big industrial data mining through explainable automated machine learning

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

Availability of data and materials

Code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflicts of interest

Additional information

Publisher’s Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards big industrial data mining through explainable automated machine learning

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

Availability of data and materials

Code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflicts of interest

Additional information

Publisher’s Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation