Skip to main content
Log in

Towards big industrial data mining through explainable automated machine learning

  • ORIGINAL ARTICLE
  • Published:
The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

Abstract

Industrial systems resources are capable of producing large amount of data. These data are often in heterogeneous formats and distributed, yet they provide means to mine the information which can allow the deployment of intelligent management tools for production activities. For this purpose, it is necessary to be able to implement knowledge extraction and prediction processes using Artificial Intelligence (AI) models, but the selection and configuration of intended AI models tend to be increasingly complex for a non-expert user. In this paper, we present an approach and a software platform that may allow industrial actors, who are usually not familiar with AI, to select and configure algorithms optimally adapted to their needs. Hence, the approach is essentially based on automated machine learning. The resulting platform effectively enables a better choice among the combination of AI algorithms and hyper-parameters configurations. It also makes it possible to provide features of explainability of the resulting algorithms and models, thus increasing the acceptability of these models in practicing community of the users. The proposed approach has been applied in the field of predictive maintenance. Current tests are based on the analysis of more than 360 databases from the subjected field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Availability of data and materials

All data generated or analyzed during this study are included in this paper.

Code availability

Software code is included in the study github repository: https://github.com/LeMGarouani/AMLBID.

Notes

  1. https://archive.ics.uci.edu/

  2. https://www.openml.org/

  3. https://www.kaggle.com/

  4. https://sci2s.ugr.es/keel/

  5. https://github.com/LeMGarouani/AMLBID

References

  1. Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59. https://doi.org/10.1089/big.2013.1508

    Article  Google Scholar 

  2. Brynjolfsson E, Hitt LM, Kim HH (2011) Strength in numbers: how does data-driven decision making affect firm performance? SSRN scholarly paper ID 1819486. Social Science Research Network, Rochester, NY. https://doi.org/10.2139/ssrn.1819486

  3. Samek W, Müller KR (2019) Towards explainable artificial intelligence. In: Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 5–22. https://doi.org/10.1007/978-3-030-28954-6

  4. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  5. Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39–55. https://doi.org/10.1109/MM.2008.31

    Article  Google Scholar 

  6. Nural MV, Peng H, Miller JA (2017) Using meta-learning for model type selection in predictive big data analytics. In: 2017 IEEE International Conference on Big Data (Big Data). pp 2027–2036. https://doi.org/10.1109/BigData.2017.8258149

  7. Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F (2015) Efficient and robust automated machine learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol 2. MIT Press, Cambridge, MA, USA, NIPS’15, pp 2755–2763

  8. Olson RS, Moore JH (2019) TPOT: A tree-based pipeline optimization tool for automating machine learning. In: The Springer Series on Challenges in Machine Learning. Springer International Publishing, Cham, pp 151–160. https://doi.org/10.1007/978-3-030-05318-5_8

  9. Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2019) Auto-WEKA: automatic model selection and hyperparameter optimization in WEKA. In: The Springer Series on Challenges in Machine Learning. Springer International Publishing, Cham, pp 81–95. https://doi.org/10.1007/978-3-030-05318-5_4

  10. Swearingen T, Drevo W, Cyphers B, Cuesta-Infante A, Ross A, Veeramachaneni K (2017) ATM: A distributed, collaborative, scalable system for automated machine learning. In: 2017 IEEE International Conference on Big Data (Big Data), pp 151–162. https://doi.org/10.1109/BigData.2017.8257923

  11. Cloud AutoML custom machine learning models. Google Cloud. https://cloud.google.com/automl (visited on 12/10/2021)

  12. Xu F, Uszkoreit H, Du Y, Fan W, Zhao D, Zhu J (2019) Explainable AI: a brief survey on history, research areas, approaches and challenges. In: Tang J, Kan MY, Zhao D, Li S, Zan H (eds) Natural Language Processing and Chinese Computing. Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 563–574. https://doi.org/10.1007/978-3-030-32236-6_51

  13. Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery. New York, NY, USA, KDD ’16, pp 1135–1144. https://doi.org/10.1145/2939672.2939778

  14. Lemke C, Budka M, Gabrys B (2015) Metalearning: a survey of trends and technologies. Artif Intell Rev 44(1):117–130. https://doi.org/10.1007/s10462-013-9406-y

    Article  Google Scholar 

  15. R. a. M. ltd. Big data market by component, deployment mode, organization size, business function (operations, finance, and marketing and sales), industry vertical (BFSI, manufacturing, and healthcare and life sciences), and region - global forecast to 2025

  16. Cuartas M, Ruiz E, Ferreño D, Setién J, Arroyo V, Gutiérrez-Solana F (2020) Machine learning algorithms for the prediction of non-metallic inclusions in steel wires for tire reinforcement. J Intell Manuf. https://doi.org/10.1007/s10845-020-01623-9

    Article  Google Scholar 

  17. Medina R, Macancela JC, Lucero P, Cabrera D, Sánchez RV, Cerrada M (2020) Gear and bearing fault classification under different load and speed by using Poincaré plot features and SVM. J Intell Manuf. https://doi.org/10.1007/s10845-020-01712-9

    Article  Google Scholar 

  18. Jalali A, Heistracher C, Schindler A, Haslhofer B, Nemeth T, Glawar R, Sihn W, De Boer P (2019) Predicting time-to-failure of plasma etching equipment using machine learning. In: 2019 IEEE International Conference on Prognostics and Health Management (ICPHM). pp 1–8. https://doi.org/10.1109/ICPHM.2019.8819404

  19. Wolpert D, Macready W (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82. https://doi.org/10.1109/4235.585893

    Article  Google Scholar 

  20. Reif M, Shafait F, Goldstein M, Breuel T, Dengel A (2014) Automatic classifier selection for non-experts. Pattern Anal Applic 17(1):83–96. https://doi.org/10.1007/s10044-012-0280-z

    Article  MathSciNet  Google Scholar 

  21. Babichev SA, Ries J, Lvovsky AI (2002) Quantum scissors: teleportation of single-mode optical states by means of a nonlocal single photon. Preprint at https://arxiv.org/abs/quant-ph/0208066v1

  22. Darwinai. https://darwinai.com/ (visited on 12/10/2021)

  23. DataRobot. https://www.datarobot.com/ (visited on 12/10/2021)

  24. Bilalli B, Abelló A, Aluja-Banet T, Wrembel R (2016) Automated data pre-processing via meta-learning. In: Bellatreche L, Pastor Ó, Almendros Jiménez JM, Aït-Ameur Y (eds) Model and Data Engineering, Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 194–208. https://doi.org/10.1007/978-3-319-45547-1_16

  25. Bilalli B, Abelló A, Aluja-Banet T, Munir RF, Wrembel R (2018) Presistant: Data pre-processing assistant. In: Mendling J, Mouratidis H (eds) Information Systems in the Big Data Era, Springer International Publishing, Cham, Lecture Notes in Business Information Processing, pp 57–65. https://doi.org/10.1007/978-3-319-92901-9_6

  26. Khurana U, Samulowitz H, Turaga D (2017) Feature engineering for predictive modeling using reinforcement learning. arXiv e-prints 1709:arXiv:1709.07150

  27. Nargesian F, Samulowitz H, Khurana U, Khalil EB, Turaga D (2017) Learning feature engineering for classification. pp 2529–2535

  28. Vainshtein R, Greenstein-Messica A, Katz G, Shapira B, Rokach L (2018) A hybrid approach for automatic model recommendation. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Association for Computing Machinery. New York, NY, USA, CIKM ’18, pp 1623–1626. https://doi.org/10.1145/3269206.3269299

  29. Feurer M, Springenberg JT, Hutter F (2015) Initializing Bayesian hyperparameter optimization via meta-learning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI Press, Austin, Texas, AAAI’15, pp 1128–1135

  30. Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-Sklearn: automatic hyperparameter configuration for Scikit-learn. Proceedings of the 13th Python in Science Conference pp 32–37. https://doi.org/10.25080/Majora-14bd3278-006

  31. Jin H, Song Q, Hu X (2019) Auto-Keras: an efficient neural architecture search system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery. New York, NY, USA, KDD ’19, pp 1946–1956. https://doi.org/10.1145/3292500.3330648

  32. Garouani M, Ahmad A, Bouneffa M, Lewandowski A, Bourguin G, Hamlich M (2021) Towards the automation of industrial data science: a meta-learning based approach. In: Proceedings of the 23rd International Conference on Enterprise Information Systems - vol. 1: ICEIS, INSTICC. SciTePress, pp 709–716. https://doi.org/10.5220/0010457107090716

  33. Maher M, Sakr S (2019) SmartML: a meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. https://doi.org/10.5441/002/edbt.2019.54

  34. Shin D, Park YJ (2019) Role of fairness, accountability, and transparency in algorithmic affordance. Comput Hum Behav 98:277–284. https://doi.org/10.1016/j.chb.2019.04.019

    Article  Google Scholar 

  35. Heath RL, Bryant J (2000) Human communication theory and research: concepts, contexts, and challenges, 2nd edn. Routledge, Mahwah, N.J.

    Google Scholar 

  36. Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang GZ (2019) XAI—Explainable artificial intelligence. Science Robotics 4(37). https://doi.org/10.1126/scirobotics.aay7120

  37. Ribeiro MT, Singh S, Guestrin C (2018) Anchors: high-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence

  38. Harley AW (2015) An interactive node-link visualization of convolutional neural networks. In: Bebis G, Boyle R, Parvin B, Koracin D, Pavlidis I, Feris R, McGraw T, Elendt M, Kopper R, Ragan E, Ye Z, Weber G (eds) Advances in Visual Computing, Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 867–877. https://doi.org/10.1007/978-3-319-27857-5_77

  39. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI (2020) From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence 2(1):56–67. https://doi.org/10.1038/s42256-019-0138-9

    Article  Google Scholar 

  40. Montavon G, Lapuschkin S, Binder A, Samek W, Müller KR (2017) Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn 65:211–222. https://doi.org/10.1016/j.patcog.2016.11.008

    Article  Google Scholar 

  41. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision – ECCV 2014. Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 818–833. https://doi.org/10.1007/978-3-319-10590-1_53

  42. Müller J, Stoehr M, Oeser A, Gaebel J, Streit M, Dietz A, Oeltze-Jafra S (2020) A visual approach to explainable computerized clinical decision support. Comput Graph 91:1–11. https://doi.org/10.1016/j.cag.2020.06.004

    Article  Google Scholar 

  43. Spinner T, Schlegel U, Schäfer H, El-Assady M (2020) explAIner: a visual analytics framework for interactive and explainable machine learning. IEEE Trans Vis Comput Graph 26(1):1064–1074. https://doi.org/10.1109/TVCG.2019.2934629

    Article  Google Scholar 

  44. Wang Q, Ming Y, Jin Z, Shen Q, Liu D, Smith MJ, Veeramachaneni K, Qu H (2019) ATMSeer: increasing transparency and controllability in automated machine learning. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery. New York, NY, USA, CHI ’19, pp 1–12. https://doi.org/10.1145/3290605.3300911

  45. Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, NIPS’11, pp 2546–2554

  46. Mazumder RK, Salman AM, Li Y (2021) Failure risk analysis of pipelines using data-driven machine learning algorithms. Struct Saf 89:102047. https://doi.org/10.1016/j.strusafe.2020.102047

  47. Saravanamurugan S, Thiyagu S, Sakthivel NR, Nair B (2017) Chatter prediction in boring process using machine learning technique. Int J Manuf Res. https://doi.org/10.1504/IJMR.2017.10007082

    Article  Google Scholar 

  48. Benkedjouh T, Medjaher K, Zerhouni N, Rechak S (2013) Health assessment and life prediction of cutting tools based on support vector regression. J Intell Manuf. https://doi.org/10.1007/s10845-013-0774-6

  49. Rouder JN, Engelhardt CR, McCabe S, Morey RD (2016) Model comparison in ANOVA. Psychon Bull Rev 23(6):1779–1786. https://doi.org/10.3758/s13423-016-1026-5

    Article  Google Scholar 

  50. Sauro J, Lewis J (2016) Standardized usability questionnaires, 2nd edn. Morgan Kaufmann, Boston, pp 185–248. https://doi.org/10.1016/B978-0-12-802308-2.00008-4

  51. Milo T, Somech A (2020) Automating exploratory data analysis via machine learning: an overview. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, ACM. https://doi.org/10.1145/3318464.3383126

  52. Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D (2016) MLlib: Machine learning in apache spark. J Mach Learn Res 17(1):1235–1241

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors thank the Université du Littoral Côte d’Opale (ULCO), France, School of engineering’s and business’ sciences and technics (HESTIM), Morocco and CNRST Morocco for the partial financial support, and all the participants involved in the system evaluation for their constructive discussions and valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moncef Garouani.

Ethics declarations

Ethics approval

All authors confirm that this article does not have any academic ethics issues and strictly follows the journal submission rules.

Consent to participate

All authors agree to participate in the research work of this paper and publish it in the International Journal of Advanced Manufacturing Technology.

Consent for publication

All authors agree to publish this article in the International Journal of Advanced Manufacturing Technology.

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Tables 11, 12, 13, 14, 15, 16, and 17 and Fig. 12

Table 11 SVM hyperparameters tuned in the experiments
Table 12 Random Forest & Extra Trees Hyperparameters tuned in the experiments
Table 13 Adaboost Hyperparameters tuned in the experiments
Table 14 Decision Trees Hyperparameters tuned in the experiments
Table 15 Logistic Regression Hyperparameters tuned in the experiments
Table 16 SGD Classifier Hyperparameters tuned in the experiments
Table 17 Gradient Boosting Hyperparameters tuned in the experiments
Fig. 12
figure 12

The Post-Study System Usability Questionnaire

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garouani, M., Ahmad, A., Bouneffa, M. et al. Towards big industrial data mining through explainable automated machine learning. Int J Adv Manuf Technol 120, 1169–1188 (2022). https://doi.org/10.1007/s00170-022-08761-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00170-022-08761-9

Keywords

Navigation