Abstract
This study introduces a novel approach that utilizes a three-dimensional tensor representation of machine-generated audio signals, serving as a suitable input for a three-dimensional convolutional neural network. The proposed method involves calculating the reconstructed phase space of the audio signal, followed by converting the resulting three-dimensional reconstructed phase space into a three-dimensional tensor format. This technique offers superiority by capturing nonlinear dynamic features and uncovering hidden system variables, which can improve discrimination and classification, enabling accurate detection of anomalous sound patterns, with valuable information encoded in the shape of the data cloud within the tensors. Subsequently, these tensors are employed as input to a three-dimensional deep convolutional neural network, facilitating effective analysis and classification of the audio signals. To assess the effectiveness of the proposed method, we conduct a comprehensive evaluation on three benchmark datasets: MFPT, MIMII, and ToyADAMOS, employing a 5-fold cross-validation scheme. The evaluation metrics employed include Sensitivity, Specificity, Accuracy, and F1 Score to ensure a thorough examination of the method's performance across diverse datasets, encompassing different machine types and acoustic environments. The experimental results showed a high average accuracy of 97.63% on the MFPT dataset. However, in the MIMII dataset, the slider machinery achieved the highest average accuracy rate of 92.02%, while the pump machinery had the lowest average accuracy rate of 90.54%. For the ToyADAMOS dataset, an average accuracy rate of approximately 94% was obtained. These findings underscore the method's potential for accurately detecting anomalies across various machine types and acoustic environments.
Similar content being viewed by others
Data availability
The experiments have been performed on three publicly available datasets; The MFPT (malfunctioning industrial machine investigation and inspection) dataset [6], The malfunctioning industrial machine investigation and inspection (MIMII) data set [30], and ToyADAMOS [18] with the following links: https://www.mfpt.org/fault-data-sets/, http://www.zenodo.org/record/3384388#.Y4SKr3bP1D8, https://paperswithcode.com/dataset/toyadmos
References
Bogdanov D, Wack N, Gómez E, Gulati S, Herrera P, Mayor O, Roma G, Salamon J, Zapata J, Serra X (2013) ESSENTIA: an Audio Analysis Library for Music Information Retrieval,14th International Society for Music Information Retrieval Conference, Curitiba
Chollet F (2017) Deep learning with python. Manning Publications
Coupé P, Mansencal B, Clément M, Giraud R, Denis de Senneville B, Ta V-T, Lepetit V, Manjon JV (2020) AssemblyNet: A large ensemble of CNNs for 3D whole brain MRI segmentation. NeuroImage 219:117026. https://doi.org/10.1016/j.neuroimage.2020.117026
Eyben F, Wöllmer M, Schuller B (2010) Opensmile Proceedings of the 18th ACM international conference on Multimedia. https://doi.org/10.1145/1873951.1874246
Farahani M, Behnam A, Ahmadian A (2021) Comparison of feature selection methods in diagnosing Alzheimer’s disease. J Med Signals Sensors 11(2):82–90. https://doi.org/10.4103/jmss.JMSS_57_20
Fault data sets (2017) https://www.mfpt.org/fault-data-sets/
Fengqi W, Meng G (2006) Compound rub malfunctions feature extraction based on full-spectrum cascade analysis and SVM. Mech Syst Signal Process 20(8):2007–2021. https://doi.org/10.1016/j.ymssp.2005.10.004
Fraser AM, Swinney HL (1986) Independent coordinates for strange attractors from mutual information. Phys Rev A Gen Phys 33(2):1134–1140. https://doi.org/10.1103/physreva.33.1134
Gribbestad M, Hassan MU, Hameed IA, Sundli K (2021) Health Monitoring of Air Compressors Using Reconstruction-Based Deep Learning for Anomaly Detection with Increased Transparency. Entropy 23(1):83. https://www.mdpi.com/1099-4300/23/1/83
Halder S, Bhat S, Dora BK (2022) Inverse thresholding to spectrogram for the detection of broken rotor bar in induction motor. Measurement 198:111400. https://doi.org/10.1016/j.measurement.2022.111400
Hamel P, Eck D (2010) Learning Features from Music Audio with Deep Belief Networks. ISMIR
Harimi A, Fakhr HS, Bakhshi A (2016) Recognition Of emotion using reconstructed phase space of speech. Malaysian J Comput Sci 29(4), 262–271. https://doi.org/10.22452/mjcs.vol29no4.2
Hong G, Suh D (2021) Supervised-Learning-Based Intelligent Fault Diagnosis for Mechanical Equipment. IEEE Access 9:116147–116162. https://doi.org/10.1109/ACCESS.2021.3104189
Jombo G, Zhang Y (2023) Acoustic-based machine condition monitoring—methods and challenges. Eng 4(1):47–79. https://www.mdpi.com/2673-4117/4/1/4
Justus V, Kanagachidambaresan (2022) Intelligent single-board computer for industry 4.0: Efficient real-time monitoring system for anomaly detection in CNC machines. Microprocess Microsyst 93(104629):104629. https://doi.org/10.1016/j.micpro.2022.104629
Kennel MB, Brown R, Abarbanel HD (1992) Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys Rev A 45(6):3403–3411. https://doi.org/10.1103/physreva.45.3403
Khurana U, Samulowitz H, Turaga D (2018) Feature engineering for predictive modeling using reinforcement learning. Proc Conf AAAI Artif Intell 32(1). https://doi.org/10.1609/aaai.v32i1.11678
Koizumi Y, Saito S, Uematsu H, Harada N, Imoto K (2019) ToyADMOS: A dataset of miniature-machine operating sounds for anomalous sound detection 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).https://doi.org/10.1109/waspaa.2019.8937164
Kovács PP, Schimmel J (2016) Higher-dimensional signal processing for vibrational analysis. In Proceedings of the 43rd International Congress on Noise Control Engineering (pp. 3086–3093)
Krajewski J, Schnieder S, Sommer D, Batliner A, Schuller B (2012) Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech. Neurocomputing 84:65–75. https://doi.org/10.1016/j.neucom.2011.12.021
Langone R, Alzate C, De Ketelaere B, Vlasselaer J, Meert W, Suykens JAK (2015) LS-SVM based spectral clustering and regression for predicting maintenance of industrial machines. Eng Appl Artif Intell 37:268–278. https://doi.org/10.1016/j.engappai.2014.09.008
Lartillot O, Toiviainen P (2007) MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio. ISMIR
Lathrop D (2015) Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and EngineeringNonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering, Steven H. strogatz, Westview press, 2015. 2nd ed. $60.00 paper (528 pp.). ISBN 978–0–813–34910–7 buy at Amazon. Phys Today 68(4):54–55. https://doi.org/10.1063/pt.3.2751
Lei X, Ji H, Xu Q, Ye T, Zhang S, Huang C (2022) Research on data diagnosis method of acoustic array sensor device based on spectrogram. Glob Energy Interconnect 5(4):418–433. https://doi.org/10.1016/j.gloei.2022.08.008
Liu C, Feng L, Liu G, Wang H, Liu S (2021) Bottom-up broadcast neural network for music genre classification. Multimed Tools Appl 80(5):7313–7331. https://doi.org/10.1007/s11042-020-09643-6
Ma H-G, Han C-Z (2006) Selection of embedding dimension and delay time in phase space reconstruction. Front Electr Electron Eng China 1(1):111–114. https://doi.org/10.1007/s11460-005-0023-7
Meyer A, Chlebus G, Rak M, Schindele D, Schostak M, van Ginneken B, Schenk A, Meine H, Hahn HK, Schreiber A, Hansen C (2021) Anisotropic 3D Multi-Stream CNN for Accurate Prostate Segmentation from Multi-Planar MRI. Comput Methods Programs Biomed 200:105821. https://doi.org/10.1016/j.cmpb.2020.105821
Nair V, Hinton G (2010) Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair, the 27th Internati onal Conference on Machine Learning (ICML-10), Haifa
Park Y-J, Fan S-KS, Hsu C-Y (2020) A review on fault detection and process diagnostics in industrial processes. Processes (Basel) 8(9):1123. https://doi.org/10.3390/pr8091123
Purohit H, Tanabe R, Ichige T, Endo T, Nikaido Y, Suefusa K, Kawaguchi Y (2019) MIMII dataset: Sound dataset for malfunctioning industrial machine investigation and inspection. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019). https://doi.org/10.33682/m76f-d618
Shah A, Mizuno A, Linghai W, Weinstein A, Aizenstein H (2021) Prediction of cognitive function based on structural mri images using a 3d convolutional neural net (cnn) among cognitively normal older adults. Bio Psychiatry 89(9, Supplement):S372. https://doi.org/10.1016/j.biopsych.2021.02.925
Shahzadi A, Ahmadyfard A, Harimi A, Yaghmaie K (2015) Speech emotion recognition using nonlinear dynamics features. TURK J Electr Eng Comput Sci 23:2056–2073. https://doi.org/10.3906/elk-1302-90
Shahzadi A, Ahmadyfard A, Yaghmaie K, Harimi A (2013) Recognition of emotion in speech using spectral patterns. Malaysian J Comput Sci 26(2):140–158. https://ejournal.um.edu.my/index.php/MJCS/article/view/6767
Shin J, Lee S (2023) Robust and lightweight deep learning model for industrial fault diagnosis in low-quality and noisy data. Electronics 12(2):409. https://www.mdpi.com/2079-9292/12/2/409
Sousa R, Antunes J, Coutinho F, Silva E, Santos J, Ferreira H (2019) Robust cepstral-based features for anomaly detection in ball bearings. Int J Adv Manuf Technol 103(5–8):2377–2390. https://doi.org/10.1007/s00170-019-03597-2
Srinivasu PN, JayaLakshmi G, Jhaveri RH, Praveen SP (2022) ambient assistive living for monitoring the physical activity of diabetic adults through body area networks. Mob Inf Syst 2022:3169927. https://doi.org/10.1155/2022/3169927
Tagawa Y, Maskeliūnas R, Damaševičius R (2021) acoustic anomaly detection of mechanical failures in noisy real-life factory environments. Electronics 10(19):2329. https://www.mdpi.com/2079-9292/10/19/2329
Takens F (1981) Detecting strange attractors in turbulence. Dynamical Systems and Turbulence, Warwick 1980, Berlin, Heidelberg
Tama BA, Vania M, Kim I, Lim S (2022) An EfficientNet-Based Weighted Ensemble Model for Industrial Machine Malfunction Detection Using Acoustic Signals. IEEE Access 10:34625–34636. https://doi.org/10.1109/ACCESS.2022.3160179
Wang L, Sun G, Wang Y, Ma J, Zhao X, Liang R (2022) AFExplorer: Visual analysis and interactive selection of audio features. Vis Inform 6(1):47–55. https://doi.org/10.1016/j.visinf.2022.02.003
Wang Y, Chen X, Jiang C (2019) Multidimensional representation learning for audio signal processing. In Proceedings of the 2019 International Joint Conference on Neural Networks (pp 1–7)
Yu H, Wang K, Li Y, He M (2021) Deep subclass reconstruction network for fault diagnosis of rotating machinery under various operating conditions. Appl Soft Comput 112(107755):107755. https://doi.org/10.1016/j.asoc.2021.107755
Yu L, Yao X, Yang J, Li C (2020) Gear fault diagnosis through vibration and acoustic signal combination based on convolutional neural network. Information 11(5)
Zabin M, Choi H-J, Uddin J (2022) Hybrid deep transfer learning architecture for industrial fault diagnosis using Hilbert transform and DCNN–LSTM. J Supercomput. https://doi.org/10.1007/s11227-022-04830-8
Zheng F, Zhang G, Song Z (2001) Comparison of different implementations of MFCC. J Comput Sci Technol 16(6):582–589. https://doi.org/10.1007/bf02943243
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
The authors did not receive support from any organization for the submitted work.
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Khanjari, M., Azarfar, A., Abardeh, M.H. et al. Anomalous sound detection for machine condition monitoring using 3D tensor representation of sound and 3D deep convolutional neural network. Multimed Tools Appl 83, 44101–44119 (2024). https://doi.org/10.1007/s11042-023-17043-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17043-9