Anomalous sound detection for machine condition monitoring using 3D tensor representation of sound and 3D deep convolutional neural network

Khanjari, Mohsen; Azarfar, Azita; Abardeh, Mohamad Hosseini; Alibeiki, Esmail

doi:10.1007/s11042-023-17043-9

Anomalous sound detection for machine condition monitoring using 3D tensor representation of sound and 3D deep convolutional neural network

Published: 17 October 2023

Volume 83, pages 44101–44119, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mohsen Khanjari¹,
Azita Azarfar¹,
Mohamad Hosseini Abardeh¹ &
…
Esmail Alibeiki¹

163 Accesses
1 Citation
Explore all metrics

Abstract

This study introduces a novel approach that utilizes a three-dimensional tensor representation of machine-generated audio signals, serving as a suitable input for a three-dimensional convolutional neural network. The proposed method involves calculating the reconstructed phase space of the audio signal, followed by converting the resulting three-dimensional reconstructed phase space into a three-dimensional tensor format. This technique offers superiority by capturing nonlinear dynamic features and uncovering hidden system variables, which can improve discrimination and classification, enabling accurate detection of anomalous sound patterns, with valuable information encoded in the shape of the data cloud within the tensors. Subsequently, these tensors are employed as input to a three-dimensional deep convolutional neural network, facilitating effective analysis and classification of the audio signals. To assess the effectiveness of the proposed method, we conduct a comprehensive evaluation on three benchmark datasets: MFPT, MIMII, and ToyADAMOS, employing a 5-fold cross-validation scheme. The evaluation metrics employed include Sensitivity, Specificity, Accuracy, and F1 Score to ensure a thorough examination of the method's performance across diverse datasets, encompassing different machine types and acoustic environments. The experimental results showed a high average accuracy of 97.63% on the MFPT dataset. However, in the MIMII dataset, the slider machinery achieved the highest average accuracy rate of 92.02%, while the pump machinery had the lowest average accuracy rate of 90.54%. For the ToyADAMOS dataset, an average accuracy rate of approximately 94% was obtained. These findings underscore the method's potential for accurately detecting anomalies across various machine types and acoustic environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bearing fault diagnosis base on multi-scale CNN and LSTM model

Article 05 June 2020

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

An end-to-end machine learning approach with explanation for time series with varying lengths

Article Open access 19 February 2024

Data availability

The experiments have been performed on three publicly available datasets; The MFPT (malfunctioning industrial machine investigation and inspection) dataset [6], The malfunctioning industrial machine investigation and inspection (MIMII) data set [30], and ToyADAMOS [18] with the following links: https://www.mfpt.org/fault-data-sets/, http://www.zenodo.org/record/3384388#.Y4SKr3bP1D8, https://paperswithcode.com/dataset/toyadmos

References

Bogdanov D, Wack N, Gómez E, Gulati S, Herrera P, Mayor O, Roma G, Salamon J, Zapata J, Serra X (2013) ESSENTIA: an Audio Analysis Library for Music Information Retrieval,14th International Society for Music Information Retrieval Conference, Curitiba
Chollet F (2017) Deep learning with python. Manning Publications
Google Scholar
Coupé P, Mansencal B, Clément M, Giraud R, Denis de Senneville B, Ta V-T, Lepetit V, Manjon JV (2020) AssemblyNet: A large ensemble of CNNs for 3D whole brain MRI segmentation. NeuroImage 219:117026. https://doi.org/10.1016/j.neuroimage.2020.117026
Article Google Scholar
Eyben F, Wöllmer M, Schuller B (2010) Opensmile Proceedings of the 18th ACM international conference on Multimedia. https://doi.org/10.1145/1873951.1874246
Farahani M, Behnam A, Ahmadian A (2021) Comparison of feature selection methods in diagnosing Alzheimer’s disease. J Med Signals Sensors 11(2):82–90. https://doi.org/10.4103/jmss.JMSS_57_20
Article Google Scholar
Fault data sets (2017) https://www.mfpt.org/fault-data-sets/
Fengqi W, Meng G (2006) Compound rub malfunctions feature extraction based on full-spectrum cascade analysis and SVM. Mech Syst Signal Process 20(8):2007–2021. https://doi.org/10.1016/j.ymssp.2005.10.004
Article Google Scholar
Fraser AM, Swinney HL (1986) Independent coordinates for strange attractors from mutual information. Phys Rev A Gen Phys 33(2):1134–1140. https://doi.org/10.1103/physreva.33.1134
Article MathSciNet Google Scholar
Gribbestad M, Hassan MU, Hameed IA, Sundli K (2021) Health Monitoring of Air Compressors Using Reconstruction-Based Deep Learning for Anomaly Detection with Increased Transparency. Entropy 23(1):83. https://www.mdpi.com/1099-4300/23/1/83
Halder S, Bhat S, Dora BK (2022) Inverse thresholding to spectrogram for the detection of broken rotor bar in induction motor. Measurement 198:111400. https://doi.org/10.1016/j.measurement.2022.111400
Article Google Scholar
Hamel P, Eck D (2010) Learning Features from Music Audio with Deep Belief Networks. ISMIR
Google Scholar
Harimi A, Fakhr HS, Bakhshi A (2016) Recognition Of emotion using reconstructed phase space of speech. Malaysian J Comput Sci 29(4), 262–271. https://doi.org/10.22452/mjcs.vol29no4.2
Hong G, Suh D (2021) Supervised-Learning-Based Intelligent Fault Diagnosis for Mechanical Equipment. IEEE Access 9:116147–116162. https://doi.org/10.1109/ACCESS.2021.3104189
Article Google Scholar
Jombo G, Zhang Y (2023) Acoustic-based machine condition monitoring—methods and challenges. Eng 4(1):47–79. https://www.mdpi.com/2673-4117/4/1/4
Justus V, Kanagachidambaresan (2022) Intelligent single-board computer for industry 4.0: Efficient real-time monitoring system for anomaly detection in CNC machines. Microprocess Microsyst 93(104629):104629. https://doi.org/10.1016/j.micpro.2022.104629
Kennel MB, Brown R, Abarbanel HD (1992) Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys Rev A 45(6):3403–3411. https://doi.org/10.1103/physreva.45.3403
Article Google Scholar
Khurana U, Samulowitz H, Turaga D (2018) Feature engineering for predictive modeling using reinforcement learning. Proc Conf AAAI Artif Intell 32(1). https://doi.org/10.1609/aaai.v32i1.11678
Koizumi Y, Saito S, Uematsu H, Harada N, Imoto K (2019) ToyADMOS: A dataset of miniature-machine operating sounds for anomalous sound detection 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).https://doi.org/10.1109/waspaa.2019.8937164
Kovács PP, Schimmel J (2016) Higher-dimensional signal processing for vibrational analysis. In Proceedings of the 43rd International Congress on Noise Control Engineering (pp. 3086–3093)
Krajewski J, Schnieder S, Sommer D, Batliner A, Schuller B (2012) Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech. Neurocomputing 84:65–75. https://doi.org/10.1016/j.neucom.2011.12.021
Langone R, Alzate C, De Ketelaere B, Vlasselaer J, Meert W, Suykens JAK (2015) LS-SVM based spectral clustering and regression for predicting maintenance of industrial machines. Eng Appl Artif Intell 37:268–278. https://doi.org/10.1016/j.engappai.2014.09.008
Article Google Scholar
Lartillot O, Toiviainen P (2007) MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio. ISMIR
Google Scholar
Lathrop D (2015) Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and EngineeringNonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering, Steven H. strogatz, Westview press, 2015. 2nd ed. $60.00 paper (528 pp.). ISBN 978–0–813–34910–7 buy at Amazon. Phys Today 68(4):54–55. https://doi.org/10.1063/pt.3.2751
Lei X, Ji H, Xu Q, Ye T, Zhang S, Huang C (2022) Research on data diagnosis method of acoustic array sensor device based on spectrogram. Glob Energy Interconnect 5(4):418–433. https://doi.org/10.1016/j.gloei.2022.08.008
Article Google Scholar
Liu C, Feng L, Liu G, Wang H, Liu S (2021) Bottom-up broadcast neural network for music genre classification. Multimed Tools Appl 80(5):7313–7331. https://doi.org/10.1007/s11042-020-09643-6
Article Google Scholar
Ma H-G, Han C-Z (2006) Selection of embedding dimension and delay time in phase space reconstruction. Front Electr Electron Eng China 1(1):111–114. https://doi.org/10.1007/s11460-005-0023-7
Article Google Scholar
Meyer A, Chlebus G, Rak M, Schindele D, Schostak M, van Ginneken B, Schenk A, Meine H, Hahn HK, Schreiber A, Hansen C (2021) Anisotropic 3D Multi-Stream CNN for Accurate Prostate Segmentation from Multi-Planar MRI. Comput Methods Programs Biomed 200:105821. https://doi.org/10.1016/j.cmpb.2020.105821
Nair V, Hinton G (2010) Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair, the 27th Internati onal Conference on Machine Learning (ICML-10), Haifa
Park Y-J, Fan S-KS, Hsu C-Y (2020) A review on fault detection and process diagnostics in industrial processes. Processes (Basel) 8(9):1123. https://doi.org/10.3390/pr8091123
Article Google Scholar
Purohit H, Tanabe R, Ichige T, Endo T, Nikaido Y, Suefusa K, Kawaguchi Y (2019) MIMII dataset: Sound dataset for malfunctioning industrial machine investigation and inspection. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019). https://doi.org/10.33682/m76f-d618
Shah A, Mizuno A, Linghai W, Weinstein A, Aizenstein H (2021) Prediction of cognitive function based on structural mri images using a 3d convolutional neural net (cnn) among cognitively normal older adults. Bio Psychiatry 89(9, Supplement):S372. https://doi.org/10.1016/j.biopsych.2021.02.925
Shahzadi A, Ahmadyfard A, Harimi A, Yaghmaie K (2015) Speech emotion recognition using nonlinear dynamics features. TURK J Electr Eng Comput Sci 23:2056–2073. https://doi.org/10.3906/elk-1302-90
Article Google Scholar
Shahzadi A, Ahmadyfard A, Yaghmaie K, Harimi A (2013) Recognition of emotion in speech using spectral patterns. Malaysian J Comput Sci 26(2):140–158. https://ejournal.um.edu.my/index.php/MJCS/article/view/6767
Shin J, Lee S (2023) Robust and lightweight deep learning model for industrial fault diagnosis in low-quality and noisy data. Electronics 12(2):409. https://www.mdpi.com/2079-9292/12/2/409
Sousa R, Antunes J, Coutinho F, Silva E, Santos J, Ferreira H (2019) Robust cepstral-based features for anomaly detection in ball bearings. Int J Adv Manuf Technol 103(5–8):2377–2390. https://doi.org/10.1007/s00170-019-03597-2
Article Google Scholar
Srinivasu PN, JayaLakshmi G, Jhaveri RH, Praveen SP (2022) ambient assistive living for monitoring the physical activity of diabetic adults through body area networks. Mob Inf Syst 2022:3169927. https://doi.org/10.1155/2022/3169927
Article Google Scholar
Tagawa Y, Maskeliūnas R, Damaševičius R (2021) acoustic anomaly detection of mechanical failures in noisy real-life factory environments. Electronics 10(19):2329. https://www.mdpi.com/2079-9292/10/19/2329
Takens F (1981) Detecting strange attractors in turbulence. Dynamical Systems and Turbulence, Warwick 1980, Berlin, Heidelberg
Tama BA, Vania M, Kim I, Lim S (2022) An EfficientNet-Based Weighted Ensemble Model for Industrial Machine Malfunction Detection Using Acoustic Signals. IEEE Access 10:34625–34636. https://doi.org/10.1109/ACCESS.2022.3160179
Article Google Scholar
Wang L, Sun G, Wang Y, Ma J, Zhao X, Liang R (2022) AFExplorer: Visual analysis and interactive selection of audio features. Vis Inform 6(1):47–55. https://doi.org/10.1016/j.visinf.2022.02.003
Article Google Scholar
Wang Y, Chen X, Jiang C (2019) Multidimensional representation learning for audio signal processing. In Proceedings of the 2019 International Joint Conference on Neural Networks (pp 1–7)
Yu H, Wang K, Li Y, He M (2021) Deep subclass reconstruction network for fault diagnosis of rotating machinery under various operating conditions. Appl Soft Comput 112(107755):107755. https://doi.org/10.1016/j.asoc.2021.107755
Article Google Scholar
Yu L, Yao X, Yang J, Li C (2020) Gear fault diagnosis through vibration and acoustic signal combination based on convolutional neural network. Information 11(5)
Zabin M, Choi H-J, Uddin J (2022) Hybrid deep transfer learning architecture for industrial fault diagnosis using Hilbert transform and DCNN–LSTM. J Supercomput. https://doi.org/10.1007/s11227-022-04830-8
Article Google Scholar
Zheng F, Zhang G, Song Z (2001) Comparison of different implementations of MFCC. J Comput Sci Technol 16(6):582–589. https://doi.org/10.1007/bf02943243
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Shahrood Branch, Islamic Azad University, Shahrood, Iran
Mohsen Khanjari, Azita Azarfar, Mohamad Hosseini Abardeh & Esmail Alibeiki

Authors

Mohsen Khanjari
View author publications
You can also search for this author in PubMed Google Scholar
Azita Azarfar
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Hosseini Abardeh
View author publications
You can also search for this author in PubMed Google Scholar
Esmail Alibeiki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Azita Azarfar.

Ethics declarations

Ethics approval

The authors did not receive support from any organization for the submitted work.

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khanjari, M., Azarfar, A., Abardeh, M.H. et al. Anomalous sound detection for machine condition monitoring using 3D tensor representation of sound and 3D deep convolutional neural network. Multimed Tools Appl 83, 44101–44119 (2024). https://doi.org/10.1007/s11042-023-17043-9

Download citation

Received: 21 April 2023
Revised: 19 July 2023
Accepted: 08 September 2023
Published: 17 October 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11042-023-17043-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Anomalous sound detection for machine condition monitoring using 3D tensor representation of sound and 3D deep convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Bearing fault diagnosis base on multi-scale CNN and LSTM model

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

An end-to-end machine learning approach with explanation for time series with varying lengths

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Anomalous sound detection for machine condition monitoring using 3D tensor representation of sound and 3D deep convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Bearing fault diagnosis base on multi-scale CNN and LSTM model

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

An end-to-end machine learning approach with explanation for time series with varying lengths

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation