Skip to main content
Log in

Deep convolutional neural network based secure wireless voice communication for underground mines

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

A secure wireless voice communication system for underground miners is an essential gadget for efficient and safe mining. Voice over internet protocol is a proven solution for wireless communication in underground mines where other cellular and satellite networks cannot be deployed. However, the wireless network's security is the major issue for the reliable operation of the system. A secure voice communication system has been developed by integrating voice over internet protocol system and deep convolutional neural network (DCNN) based trained model. Experimental results indicated that voice recognition accuracy of the DCNN based developed model was 93.7% for the noiseless environment. In contrast, it was 82.1 and 79% for the existing K-nearest-neighbour (KNN) and support vector machine (SVM) algorithms, respectively. Voice recognition response time of the DCNN, KNN, and SVM algorithms was 178, 220, and 228 ms, respectively. Thus, deployment of the developed secure and robust voice communication system would improve safety and productivity in underground mines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

source voice into the target voice using DCNN based pre-trained model

Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  • Andrearczyk V, Whelan PF (2016) Deep learning methods for texture analysis in medical imaging. In: Proceedings of the 18th Irish machine vision and image processing conference, Galway, Ireland, pp 1–6

  • Ardila D, Resnick C, Roberts A, Eck D (2016) Audio deepdream: optimizing raw audio with convolutional networks. In: Proceedings of the International Society for music information retrieval conference, USA, pp 7–11

  • Bahn S (2013) Workplace hazard identification and management the case of an underground mining operation. Saf Sci 57:129–137

    Article  Google Scholar 

  • Caldwell CE, Linkola JP (2018) US Patent Application No. 15/944749

  • Chauhan N, Isshiki T, Li D (2019) Speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database. In: Proceedings of IEEE international conference on Computer and Communication Systems, Singapore, pp 130–133

  • Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Proceedings of advances in neural information processing systems, Canada, pp 577–585

  • Chorowski J, Weis RJ, Saurou RA, Bengio S (2018) On using backpropagation for speech texture generation and voice conversion. In: Proceedings of IEEE International Conference on acoustics speech and signal, Canada, pp 2256–2260.

  • Dantu R, Fahmy S, Schulzrinne H, Cangussu J (2009) Issues and challenges in securing VoIP. Comput Secur 28(8):743–753

    Article  Google Scholar 

  • Dewi SP, Prasasti AL, Irawan B (2019) Analysis of LFCC feature extraction in baby crying classification using KNN. In: Proceedings of IEEE International Conference on Internet of Things and Intelligence System, Indonesia, pp 86–91

  • Dutoit T (1997) An introduction to text-to-speech synthesis, vol 3. Springer Science and Business Media, London

    Book  Google Scholar 

  • Fadlilah AF, Djamal EC (2019) Speaker and speech recognition using hierarchy support vector machine and backpropagation. In: Proceedings of IEEE International Conference on Electrical Engineering, Computer Science and Informatics, Indonesia, pp 404–409.

  • Goetz CG, Poewe W, Rascol O, Sampaio C, Stebbins GT (2003) The unified Parkinson’s Disease Rating Scale (UPDRS): status and recommendations. Mov Disord 18(7):738–750. https://doi.org/10.1002/mds.10473

    Article  Google Scholar 

  • Goode B (2002) Voice over internet protocol (VoIP). In: Proceedings of IEEE, pp 1495–1517

  • Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of International Conference on machine learning, China, pp 1764–1772

  • Gupta DK, Gupta VK, Chandra M, Mishra AN, Srivastava PK (2019) Hardware co-simulation of adaptive noise cancellation system using LMS and leaky LMS algorithms. In: Proceedings of IEEE International Conference on internet of things: smart innovation and usages, India, pp 1–6.

  • Hsieh WB, Leu JS (2018) Implementing a secure VoIP communication over SIP-based networks. Wirel Netw 24(8):2915–2926

    Article  Google Scholar 

  • Ikeda H, Kawamura Y, Jang, H, Mokhtar NEB, Yokokura J, Paul Z, Tungol L(2019) Development of an underground in-situ stress monitoring system for mining safety using multi sensor cell and wi-fi direct technology. In: Proceedings of International Symposium on mine planning and equipment Selection, Springer, Cham, pp 236–244

  • Jiang H, Bai J, Zhang S, Xu B (2005) SVM-based audio scene classification. In: Proceedings of IEEE International Conference on natural language processing and knowledge engineering, China, pp 131–136

  • Kaul S, Jain A (2019) Opus and session initiation protocol security in voice over IP (VOIP). Eur J Eng Res Sci 4(12):27–37

    Article  Google Scholar 

  • Kekre HB, Kulkarni GP, Gupta N (2012) Speaker identification using spectrograms of varying frame sizes. Int J oComput Appl 50(20):27–33

    Google Scholar 

  • Khan HM, Gunnalan R, Mahabhashyam (2013) SPM US Patent No. 8,385,326. Washington DC US Patent and Trademark Office.

  • Khunarsal P, Lursinsa C, Raicharoen T (2009) Singing voice recognition based on matching of spectrogram pattern. In: Proceedings of International Joint Conference on neural networks, USA, pp 1595–1599

  • Kitajima H (1980) A symmetric cosine transform. IEEE Trans Comput 4:317–323

    Article  Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceeding of advances in neural information processing systems, United States, pp 1097–1105

  • Kubichek R (1993) Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of IEEE Pacific Rim Conference on communications computers and signal processing, Canada, pp 125–128

  • Liu H, Li L, Ma J (2016) Rolling bearing fault diagnosis based on STFT-deep learning and sound signals. Shock Vib 2:1–12

    Google Scholar 

  • Matveykin V, Nemtinov V, Dmitrievsky B, Praveen K (2019) Development and implementation of network based underground mines safety, rescue and aided rescue system. In: Proceedings of journal of physics: conference series, Russia, pp 1–12

  • Misra P, Kanhere S, Ostry D, Jha S (2010) Safety assurance and rescue communication systems in high-stress environments a mining case study. IEEE Commun Mag 48(4):66–739

    Article  Google Scholar 

  • Mohammadi SH, Kain A (2017) An overview of voice conversion systems. Speech Commun 88:65–82

    Article  Google Scholar 

  • Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet a generative model for raw audio.arXiv preprint arXiv:1609.03499

  • Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434

  • Salau AO, Oluwafemi I, Faleye KF, Jain S (2019) audio compression using a modified discrete cosine transform with temporal auditory masking. In: Proceedings of IEEE International Conference on signal processing and communication (ICSC), India, pp 135–142

  • Saleh JH, Cummings AM (2011) Safety in the mining industry and the unfinished legacy of mining accidents safety levers and defense-in-depth for addressing mining hazards. Saf Sci 49(6):764–777

    Article  Google Scholar 

  • Sanmiquel L, Rossell JM, Vintró C (2015) Study of Spanish mining accidents using data mining techniques. Saf Sci 75:49–55

    Article  Google Scholar 

  • Schuller B, Reiter S, Muller R, Al-Hames M, Lang M, Rigoll G (2005) Speaker independent speech emotion recognition by ensemble classification. In: Proceedings of IEEE International Conference on multimedia and expo, Netherlands, pp 864–867

  • Schulzrinne H, Wedlun E (2000) Application-layer mobility using SIP. Mob Comput Commun Rev 4(3):47–57

    Article  Google Scholar 

  • Shunyi RXZ (2001) Next generation network architecture based on softswitch. Telecommn Sci 8:25–31

    Google Scholar 

  • Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034

  • Sing S, Keil D, Loeser M, South M, Villani P (2018) US Patent Application No. 15/925,063

  • Ulyanov D, Lebedev V (2016) Audio texture synthesis and style transfer. https://www.dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer. Accessed 13 Dec 2016

  • Wang X, Tang H, Zhao X (2004) Noisy speech pitch detection based on mathematical morphology and weighted MACF. In: Proceedings of Chinese Conference on biometric recognition, Berlin, pp 594–601

  • Wang Y, Xie Z, Xu K, Dou Y, Lei Y (2016) An efficient and effective convolutional auto-encoder extreme learning machine network for 3D feature learning. Neurocomputing 174:988–998

    Article  Google Scholar 

  • Winursito A, Hidayat R, Bejo A (2018) Improvement of mfcc feature extraction accuracy using PCA in Indonesian speech recognition. In: Proceedings of IEEE international Conference on information and communications technology, Indonesia, pp 379–383

  • Yu J, Al Ajarmeh I (2008) Design and traffic engineering of VoIP for enterprise and carrier networks. Int J Adv Telecommun 1(1):1–13

    Google Scholar 

  • Yue J, Wang Z, Ran Y (2019) SIP-based interactive voice response system using frees witch EPBX. In: Proceedings of International Conference on intelligent and interactive systems and applications, Thailand, pp 614–621

  • Zen H, Agiomyrgiannakis Y, Egberts N, Henderson, F, Szczepaniak P (2016) Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthesizers for mobile devices. arXiv preprint arXiv:1606.06061

Download references

Acknowledgements

Authors are grateful to Dr. Pradeep K. Singh, Director of CSIR-Central Institute of Mining and Fuel Research, Dhanbad, India, to publish this paper. The authors are also thankful to the Ministry of Electronics and Information Technology, Government of India, for supporting this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. K. Chaulya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dey, P., Kumar, C., Mitra, M. et al. Deep convolutional neural network based secure wireless voice communication for underground mines. J Ambient Intell Human Comput 12, 9591–9610 (2021). https://doi.org/10.1007/s12652-020-02700-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02700-w

Keywords

Navigation