Abstract
Industrial systems resources are capable of producing large amount of data. These data are often in heterogeneous formats and distributed, yet they provide means to mine the information which can allow the deployment of intelligent management tools for production activities. For this purpose, it is necessary to be able to implement knowledge extraction and prediction processes using Artificial Intelligence (AI) models, but the selection and configuration of intended AI models tend to be increasingly complex for a non-expert user. In this paper, we present an approach and a software platform that may allow industrial actors, who are usually not familiar with AI, to select and configure algorithms optimally adapted to their needs. Hence, the approach is essentially based on automated machine learning. The resulting platform effectively enables a better choice among the combination of AI algorithms and hyper-parameters configurations. It also makes it possible to provide features of explainability of the resulting algorithms and models, thus increasing the acceptability of these models in practicing community of the users. The proposed approach has been applied in the field of predictive maintenance. Current tests are based on the analysis of more than 360 databases from the subjected field.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
All data generated or analyzed during this study are included in this paper.
Code availability
Software code is included in the study github repository: https://github.com/LeMGarouani/AMLBID.
References
Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59. https://doi.org/10.1089/big.2013.1508
Brynjolfsson E, Hitt LM, Kim HH (2011) Strength in numbers: how does data-driven decision making affect firm performance? SSRN scholarly paper ID 1819486. Social Science Research Network, Rochester, NY. https://doi.org/10.2139/ssrn.1819486
Samek W, Müller KR (2019) Towards explainable artificial intelligence. In: Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 5–22. https://doi.org/10.1007/978-3-030-28954-6
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39–55. https://doi.org/10.1109/MM.2008.31
Nural MV, Peng H, Miller JA (2017) Using meta-learning for model type selection in predictive big data analytics. In: 2017 IEEE International Conference on Big Data (Big Data). pp 2027–2036. https://doi.org/10.1109/BigData.2017.8258149
Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F (2015) Efficient and robust automated machine learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol 2. MIT Press, Cambridge, MA, USA, NIPS’15, pp 2755–2763
Olson RS, Moore JH (2019) TPOT: A tree-based pipeline optimization tool for automating machine learning. In: The Springer Series on Challenges in Machine Learning. Springer International Publishing, Cham, pp 151–160. https://doi.org/10.1007/978-3-030-05318-5_8
Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2019) Auto-WEKA: automatic model selection and hyperparameter optimization in WEKA. In: The Springer Series on Challenges in Machine Learning. Springer International Publishing, Cham, pp 81–95. https://doi.org/10.1007/978-3-030-05318-5_4
Swearingen T, Drevo W, Cyphers B, Cuesta-Infante A, Ross A, Veeramachaneni K (2017) ATM: A distributed, collaborative, scalable system for automated machine learning. In: 2017 IEEE International Conference on Big Data (Big Data), pp 151–162. https://doi.org/10.1109/BigData.2017.8257923
Cloud AutoML custom machine learning models. Google Cloud. https://cloud.google.com/automl (visited on 12/10/2021)
Xu F, Uszkoreit H, Du Y, Fan W, Zhao D, Zhu J (2019) Explainable AI: a brief survey on history, research areas, approaches and challenges. In: Tang J, Kan MY, Zhao D, Li S, Zan H (eds) Natural Language Processing and Chinese Computing. Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 563–574. https://doi.org/10.1007/978-3-030-32236-6_51
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery. New York, NY, USA, KDD ’16, pp 1135–1144. https://doi.org/10.1145/2939672.2939778
Lemke C, Budka M, Gabrys B (2015) Metalearning: a survey of trends and technologies. Artif Intell Rev 44(1):117–130. https://doi.org/10.1007/s10462-013-9406-y
R. a. M. ltd. Big data market by component, deployment mode, organization size, business function (operations, finance, and marketing and sales), industry vertical (BFSI, manufacturing, and healthcare and life sciences), and region - global forecast to 2025
Cuartas M, Ruiz E, Ferreño D, Setién J, Arroyo V, Gutiérrez-Solana F (2020) Machine learning algorithms for the prediction of non-metallic inclusions in steel wires for tire reinforcement. J Intell Manuf. https://doi.org/10.1007/s10845-020-01623-9
Medina R, Macancela JC, Lucero P, Cabrera D, Sánchez RV, Cerrada M (2020) Gear and bearing fault classification under different load and speed by using Poincaré plot features and SVM. J Intell Manuf. https://doi.org/10.1007/s10845-020-01712-9
Jalali A, Heistracher C, Schindler A, Haslhofer B, Nemeth T, Glawar R, Sihn W, De Boer P (2019) Predicting time-to-failure of plasma etching equipment using machine learning. In: 2019 IEEE International Conference on Prognostics and Health Management (ICPHM). pp 1–8. https://doi.org/10.1109/ICPHM.2019.8819404
Wolpert D, Macready W (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82. https://doi.org/10.1109/4235.585893
Reif M, Shafait F, Goldstein M, Breuel T, Dengel A (2014) Automatic classifier selection for non-experts. Pattern Anal Applic 17(1):83–96. https://doi.org/10.1007/s10044-012-0280-z
Babichev SA, Ries J, Lvovsky AI (2002) Quantum scissors: teleportation of single-mode optical states by means of a nonlocal single photon. Preprint at https://arxiv.org/abs/quant-ph/0208066v1
Darwinai. https://darwinai.com/ (visited on 12/10/2021)
DataRobot. https://www.datarobot.com/ (visited on 12/10/2021)
Bilalli B, Abelló A, Aluja-Banet T, Wrembel R (2016) Automated data pre-processing via meta-learning. In: Bellatreche L, Pastor Ó, Almendros Jiménez JM, Aït-Ameur Y (eds) Model and Data Engineering, Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 194–208. https://doi.org/10.1007/978-3-319-45547-1_16
Bilalli B, Abelló A, Aluja-Banet T, Munir RF, Wrembel R (2018) Presistant: Data pre-processing assistant. In: Mendling J, Mouratidis H (eds) Information Systems in the Big Data Era, Springer International Publishing, Cham, Lecture Notes in Business Information Processing, pp 57–65. https://doi.org/10.1007/978-3-319-92901-9_6
Khurana U, Samulowitz H, Turaga D (2017) Feature engineering for predictive modeling using reinforcement learning. arXiv e-prints 1709:arXiv:1709.07150
Nargesian F, Samulowitz H, Khurana U, Khalil EB, Turaga D (2017) Learning feature engineering for classification. pp 2529–2535
Vainshtein R, Greenstein-Messica A, Katz G, Shapira B, Rokach L (2018) A hybrid approach for automatic model recommendation. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Association for Computing Machinery. New York, NY, USA, CIKM ’18, pp 1623–1626. https://doi.org/10.1145/3269206.3269299
Feurer M, Springenberg JT, Hutter F (2015) Initializing Bayesian hyperparameter optimization via meta-learning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI Press, Austin, Texas, AAAI’15, pp 1128–1135
Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-Sklearn: automatic hyperparameter configuration for Scikit-learn. Proceedings of the 13th Python in Science Conference pp 32–37. https://doi.org/10.25080/Majora-14bd3278-006
Jin H, Song Q, Hu X (2019) Auto-Keras: an efficient neural architecture search system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery. New York, NY, USA, KDD ’19, pp 1946–1956. https://doi.org/10.1145/3292500.3330648
Garouani M, Ahmad A, Bouneffa M, Lewandowski A, Bourguin G, Hamlich M (2021) Towards the automation of industrial data science: a meta-learning based approach. In: Proceedings of the 23rd International Conference on Enterprise Information Systems - vol. 1: ICEIS, INSTICC. SciTePress, pp 709–716. https://doi.org/10.5220/0010457107090716
Maher M, Sakr S (2019) SmartML: a meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. https://doi.org/10.5441/002/edbt.2019.54
Shin D, Park YJ (2019) Role of fairness, accountability, and transparency in algorithmic affordance. Comput Hum Behav 98:277–284. https://doi.org/10.1016/j.chb.2019.04.019
Heath RL, Bryant J (2000) Human communication theory and research: concepts, contexts, and challenges, 2nd edn. Routledge, Mahwah, N.J.
Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang GZ (2019) XAI—Explainable artificial intelligence. Science Robotics 4(37). https://doi.org/10.1126/scirobotics.aay7120
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: high-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence
Harley AW (2015) An interactive node-link visualization of convolutional neural networks. In: Bebis G, Boyle R, Parvin B, Koracin D, Pavlidis I, Feris R, McGraw T, Elendt M, Kopper R, Ragan E, Ye Z, Weber G (eds) Advances in Visual Computing, Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 867–877. https://doi.org/10.1007/978-3-319-27857-5_77
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI (2020) From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence 2(1):56–67. https://doi.org/10.1038/s42256-019-0138-9
Montavon G, Lapuschkin S, Binder A, Samek W, Müller KR (2017) Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn 65:211–222. https://doi.org/10.1016/j.patcog.2016.11.008
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision – ECCV 2014. Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 818–833. https://doi.org/10.1007/978-3-319-10590-1_53
Müller J, Stoehr M, Oeser A, Gaebel J, Streit M, Dietz A, Oeltze-Jafra S (2020) A visual approach to explainable computerized clinical decision support. Comput Graph 91:1–11. https://doi.org/10.1016/j.cag.2020.06.004
Spinner T, Schlegel U, Schäfer H, El-Assady M (2020) explAIner: a visual analytics framework for interactive and explainable machine learning. IEEE Trans Vis Comput Graph 26(1):1064–1074. https://doi.org/10.1109/TVCG.2019.2934629
Wang Q, Ming Y, Jin Z, Shen Q, Liu D, Smith MJ, Veeramachaneni K, Qu H (2019) ATMSeer: increasing transparency and controllability in automated machine learning. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery. New York, NY, USA, CHI ’19, pp 1–12. https://doi.org/10.1145/3290605.3300911
Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, NIPS’11, pp 2546–2554
Mazumder RK, Salman AM, Li Y (2021) Failure risk analysis of pipelines using data-driven machine learning algorithms. Struct Saf 89:102047. https://doi.org/10.1016/j.strusafe.2020.102047
Saravanamurugan S, Thiyagu S, Sakthivel NR, Nair B (2017) Chatter prediction in boring process using machine learning technique. Int J Manuf Res. https://doi.org/10.1504/IJMR.2017.10007082
Benkedjouh T, Medjaher K, Zerhouni N, Rechak S (2013) Health assessment and life prediction of cutting tools based on support vector regression. J Intell Manuf. https://doi.org/10.1007/s10845-013-0774-6
Rouder JN, Engelhardt CR, McCabe S, Morey RD (2016) Model comparison in ANOVA. Psychon Bull Rev 23(6):1779–1786. https://doi.org/10.3758/s13423-016-1026-5
Sauro J, Lewis J (2016) Standardized usability questionnaires, 2nd edn. Morgan Kaufmann, Boston, pp 185–248. https://doi.org/10.1016/B978-0-12-802308-2.00008-4
Milo T, Somech A (2020) Automating exploratory data analysis via machine learning: an overview. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, ACM. https://doi.org/10.1145/3318464.3383126
Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D (2016) MLlib: Machine learning in apache spark. J Mach Learn Res 17(1):1235–1241
Acknowledgements
The authors thank the Université du Littoral Côte d’Opale (ULCO), France, School of engineering’s and business’ sciences and technics (HESTIM), Morocco and CNRST Morocco for the partial financial support, and all the participants involved in the system evaluation for their constructive discussions and valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
All authors confirm that this article does not have any academic ethics issues and strictly follows the journal submission rules.
Consent to participate
All authors agree to participate in the research work of this paper and publish it in the International Journal of Advanced Manufacturing Technology.
Consent for publication
All authors agree to publish this article in the International Journal of Advanced Manufacturing Technology.
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Garouani, M., Ahmad, A., Bouneffa, M. et al. Towards big industrial data mining through explainable automated machine learning. Int J Adv Manuf Technol 120, 1169–1188 (2022). https://doi.org/10.1007/s00170-022-08761-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00170-022-08761-9