Telephony speech system performance based on the codec effect

Hamidi, Mohamed; Zealouk, Ouissam; Satori, Hassan

doi:10.1007/s12243-023-00968-5

Telephony speech system performance based on the codec effect

Published: 31 May 2023

Volume 78, pages 617–625, (2023)
Cite this article

Annals of Telecommunications Aims and scope Submit manuscript

77 Accesses
Explore all metrics

Abstract

This paper is a part of our contribution to research on the enhancement of network automatic speech recognition system performance. We built a highly configurable platform by using hidden Markov models, Gaussian mixture models, and Mel frequency spectral coefficients, in addition to VoIP G.711-u and GSM codecs. To determine the optimal values for maximum performance, different acoustic models are prepared by varying the hidden Markov models (from 3 to 5) and Gaussian mixture models (8–16-32) with 13 feature extraction coefficients. Additionally, our generated acoustic models are tested by unencoded and encoded speech data based on G.711 and GSM codecs. The best parameterization performance is obtained for 3 HMM, 8–16 GMMs, and G.711 codecs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Data Availability

The speech database utilized in this study belongs to the laboratory and is its property.

References

Walid M, Bousselmi S, Dabbabi K, Cherif A (2019) Real-time implementation of isolated-word speech recognition system on raspberry Pi 3 using WAT-MFCC. IJCSNS 19(3):42
Google Scholar
Hamidi M, Zealouk O, Satori H, Laaidi N, Salek A (2022) COVID-19 assessment using HMM cough recognition system. Int J Inf Technol 1–9
Kim HK, Cox RV (2001) A bitstream-based front-end for wireless speech recognition on IS-136 communications system. IEEE Trans Speech Audio Process 9(5):558–568
Article Google Scholar
Lilly BT, Paliwal KK (1996) Effect of speech coders on speech recognition performance. In Proceedings of ICSLP, 2344–2347
Das TK, Nahar KM (2016) A voice identification system using hidden Markov model. Indian J Sci Technol 9(4)
Satori H, Elhaoussi F (2014) Investigation Amazigh speech recognition using CMU tools. Int J Speech Technol 17(3):235–243
Article Google Scholar
Karan B, Sahoo J, Sahu PK (2015) Automatic speech recognition based Odia system. In Microwave, Optical and Communication Engineering (ICMOCE), International Conference on (pp. 353–356). IEEE
Micolini O, Herrera A, Erlang AM (2013) Traffic analysis over a VoIP server. 11(1):370–375
Handley M, Schulzrinne H, Schooler H et al (1999) RFC 2543. Session Initiation Protocol, SIP
Google Scholar
RFC3550-IETF, R. T. P. (2003) A transport protocol for real-time applications internet engineering Task Force
Kumar A, Thorenoor SG (2011) Analysis of IP Network for different quality of service. In International Symposium on Computing, Communication, and Control (ISCCC), Proc. of CSIT Vol. 1
Karapantazis S, Pavlidou FN (2009) VoIP: a comprehensive survey on a promising technology. Comput Netw 53(12):2050–2090
Article Google Scholar
Zealouk O, Satori H, Hamidi M, Laaidi N, Satori K (2018) Vocal parameters analysis of smoker using Amazigh language. Int J Speech Technol 21(1):85–91
Article Google Scholar
Zealouk O, Satori H, Hamidi M, Satori K (2019) Speech recognition for moroccan dialects: feature extraction and classification methods. J Adv Res Dyn Control Syst 11(2):1401–1408
Google Scholar
Lounnas K, Abbas M, Lichouri M, Hamidi M, Satori H, Teffahi H (2022) Enhancement of spoken digits recognition for under-resourced languages: case of Algerian and Moroccan dialects. Int J Speech Technol 25(2):443–455
Article Google Scholar
Zealouk O, Satori H, Hamidi M, Satori K (2018. Voice pathology assessment based on automatic speech recognition using Amazigh digits. In Proceedings of the 2nd International Conference on Smart Digital Environment. ACM, pp. 100–105
Hamidi M, Satori H, Zealouk O, Satori K, Laaidi N (2018) Interactive voice response server voice network administration using hidden markov model speech recognition system. In 2018 Second World Conference on Smart Trends in Systems, Secur Sustain (WorldS4) (pp. 16–21). IEEE
Zealouk O, Hamidi M, Satori H, Satori K (2020) Amazigh digits speech recognition system under noise car environment. In Embedded systems and artificial intelligence: Proceedings of ESAI 2019, Fez, Morocco (pp. 421–428). Springer Singapore
Boutazart Y, Satori H, Anselme RAM, Hamidi M, Satori K (2023) COVID-19 dataset clustering based on K-means and EM algorithms. Int J Adv Comput Sci Appl 14(3):924–934
Google Scholar
Zheng F, Zhang G, Song Z (2001) Comparison of different implementations of MFCC. J Comput Sci Technol 16(6):582–589
Article MATH Google Scholar
Shattuck-Hufnagel S, Klatt DH (1979) The limited use of distinctive features and markedness in speech production: evidence from speech error data. J Verbal Learn Verbal Behav 18(1):41–55
Article Google Scholar
Fosler-Lussier E, Morgan N (1999) Effects of speaking rate and word frequency on pronunciations in convertional speech. Speech Commun 29(2–4):137–158
Article Google Scholar
Lero RD, Exton C, Le Gear A (2019) Communications using a speech-to-text-to-speech pipeline. In 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob) (pp. 1–6). IEEE
Drude L, Heymann J, Schwarz A, Valin JM (2021) Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget. Preprint arXiv:2106.07994
Das S, Choudhury P (2020) Evaluation of perceived speech quality for VoIP codecs under different loudness and background noise condition. In Proceedings of the 21st International Conference on Distributed Computing and Networking (pp. 1–5)
Bakri A, Amrouche A, Abbas M, Bouchakour L (2018) Automatic speech recognition for VoIP with packet loss concealment. Procedia Comput Sci 128:72–78
Article Google Scholar
Hamidi M, Zealouk O, Satori H (2023) Automatic speech recognition analysis over wireless networks. In: Bhateja, V., Yang, XS., Chun-Wei Lin, J., Das, R. (eds) Intelligent data engineering and analytics. FICTA 2022. Smart Innovation, Systems and Technologies, vol 327. Springer, Singapore
Shah SAA, ul Asar A, Shaukat SF (2009) Neural network solution for secure interactive voice response. World Appl Sci J 6(9):1264–1269, ISSN 1818- 4952
Ahmad J, Fiaz M, Kwon SI, Sodanil M, Vo B, Baik SW (2016) Gender identification using MFCC for telephone applications-a comparative study, arXiv preprint arXiv: 1601.01577
Hamidi M, Satori H, Zealouk O, Satori K (2020) Amazigh digits through interactive speech recognition system in noisy environment. Int J Speech Technol 23(1):101–109
Article Google Scholar
Hamidi M, Satori H, Zealouk O, Satori K (2020) Interactive voice application-based amazigh speech recognition. In Embedded Systems and Artificial Intelligence (pp. 271–279). Springer, Singapore
Hamidi M, Satori H, Zealouk O, Satori K (2019) Speech coding effect on amazigh alphabet speech recognition performance. J Adv Res Dyn Control Syst 11(2):1392–1400
Google Scholar

Download references

Author information

Authors and Affiliations

Team of Modeling and Scientific Computing, FPN, UMP, Oujda, Morocco
Mohamed Hamidi
Department of Mathematics and Computer Science, LISAC, FSDM, USMBA, Fez, Morocco
Ouissam Zealouk & Hassan Satori

Authors

Mohamed Hamidi
View author publications
You can also search for this author in PubMed Google Scholar
Ouissam Zealouk
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Satori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Hamidi.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hamidi, M., Zealouk, O. & Satori, H. Telephony speech system performance based on the codec effect. Ann. Telecommun. 78, 617–625 (2023). https://doi.org/10.1007/s12243-023-00968-5

Download citation

Received: 21 April 2022
Accepted: 13 May 2023
Published: 31 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s12243-023-00968-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Telephony speech system performance based on the codec effect

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Telephony speech system performance based on the codec effect

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation