Waveform based speech coding using nonlinear predictive techniques: a systematic review

Sheferaw, Gebremichael Kibret; Mwangi, Waweru; Kimwele, Michael; Mamuye, Adane

doi:10.1007/s10772-023-10072-7

Waveform based speech coding using nonlinear predictive techniques: a systematic review

Published: 21 December 2023

Volume 26, pages 1031–1059, (2023)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Gebremichael Kibret Sheferaw ORCID: orcid.org/0009-0009-4700-0941¹,
Waweru Mwangi¹,
Michael Kimwele¹ &
…
Adane Mamuye²

320 Accesses
1 Citation
Explore all metrics

Abstract

Speech coding is a technique that compresses speech signals into a smaller digital form, making it easier to transmit or store, while still maintaining the quality and intelligibility of the speech. The review aimed to identify and analyses the most effective waveform-based nonlinear speech coding prediction techniques, including the use of neural networks and polynomial filters. The study analyzed 29 publications from 2000 to 2023 and found that neural network-based models are widely used for speech compression, with RNN topologies being favored due to their ability to introduce nonlinearity and nonstationary. While nonlinear adaptive speech prediction techniques have been explored for speech coding, further research is needed to optimize the adaptive algorithms used in these models. The review also identified a need for future research to address quality performance and computational cost, and suggested further exploration of RNN predictor models. The methodology used in this study involved a computer science approach that follows three main phases: planning, conducting, and reporting. Six different stages were followed, including determining research questions, defining research approach, study selection criteria, quality measurement criteria, data extraction strategy, and synthesizing extracted data. Overall, this study highlights the need for continued research in the development and improvement of neural network-based speech compression models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech coding techniques and challenges: a comprehensive literature survey

Article 14 September 2023

Review of methods for coding of speech signals

Article Open access 07 February 2023

Automatic Speech Recognition Based on Neural Networks

Data availability

The findings of this systematic literature review are based on previously published studies. All necessary references, including authors, titles, publication years, and sources, have been provided in the reference section. No primary data were generated during this study. Readers can access the data by referring to the original publications cited in this manuscript. Contact information for corresponding authors can be found within the respective publications. No additional datasets or supplementary materials were used. The methods, including study selection and data synthesis, are described in the Methods section. The search strategy employed and databases used are also provided. We acknowledge the authors of the included studies for their contributions to the existing literature.

Abbreviations

ADC:: Analog to digital converter
ADPCM:: Adaptive differential pulse code modulation
APCM:: Algebraic pulse code modulation
ATC:: Adaptive transform coding
BPTT:: Backpropagation through time
CELP:: Code excited linear prediction
CNN:: Convolutional neural network
CWT:: Continuous wavelet transform
DCT:: Discrete Cosine transform
DM:: Delta modulation
DPCM:: Differential pulse code modulation
DWT:: Discrete wavelet transform
FFT:: Fast Fourier transform
GRUs:: Gated recurrent units
ITU-T:: International telecommunication union telecommunication
LMS:: Least Mean squares
LPC:: Linear Predictive coding
LSTM:: Long Short-term memory
MDCT:: Modified discrete cosine transform
MELP:: Mixed excitation linear prediction
MLP:: Multilayer perceptron
MOS:: Mean opinion score
MSE:: Mean squared error
IMA:: Interactive Multimedia Association
SNR:: Signal-to-noise ratio
SEGSNR:: Segmental signal-to-noise ratio
PCM:: Pulse-code modulation
POLQA:: Perceptual Objective Listening Quality Assessment
TIMIT:: Texas Instruments Misspoken Telephone Corpus
NN:: Neural network
PCM:: Pulse Code Modulation
RELP:: Residual Excited Linear Prediction
RLS:: Recursive Least Squares
RNN:: Recurrent Neural Network
RTRL:: Real-time recurrent learning
SBC:: Sub-Band Coding
SELP:: Stochastic excitation linear prediction
SLR:: System literature review
VFC:: Variance fractal compression
VoIP:: Voice over internet protocol

References

Alipoor, G. H., & Savoji, M. H. (2006). Speech coding using non-linear prediction based on Volterra series expansion. SPECOM
Alipoor, G., & Savoji, M. H. (2007). Nonlinear speech coding using backward adaptive variable-length quadratic filters. In ISPA 2007 - Proceeding of the 5th international symposium on image and signal processing and analysis, (pp. 185–189). https://doi.org/10.1109/ISPA.2007.4383687.
Alipoor, G., & Savoji, M. H. (2012). Wide-band speech coding using kernel methods and bandwidth extension based on parametric stereo. In 2012 Proceedings of the 20th European signal processing conference (EUSIPCO) (pp. 2767–2771). IEEE
Alqushaibi, A., Abdulkadir, S. J., Rais, H. M., & Al-Tashi, Q. (2020). A review of weight optimization techniques in recurrent neural networks. In 2020 international conference on computational intelligence (ICCI) (pp. 196–201). IEEE
Ashdown, I. (2006, September). Extended parallel pulse code modulation of LEDs. In Sixth international conference on solid state lighting (Vol. 6337, pp. 169–178). SPIE. https://doi.org/10.1117/12.679674.
G. Bellec, Scherr, F., Hajek, E., Salaj, D., Legenstein, R., & Maass, W. (2019). Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets. 1–37. [Online], Available: http://arxiv.org/abs/1901.09049.
Berglund, K. (2004). Speech compression and tone detection in a real-time system
Besacier, L., Bergamini, C., Vaufreydaz, D., & Castelli, E. (2001, October). The effect of speech and audio compression on speech recognition performance. In 2001 IEEE fourth workshop on multimedia signal processing (Cat. No. 01TH8564) (pp. 301–306). IEEE.
Cernak, M., & Asaei, A. (2016). Cognitive speech coding (No. REP_WORK). Idiap
Chavan, K., Jawale, P., Pzatil, S., & Mumbai, N. (2016). SPEECH CODING. Vol. 40, no. 40, pp. 117–120.
Cho, K., van Merrienboer, B., Bahdanau, D., & Bengio, Y. (2015). On the properties of neural machine translation: Encoder–decoder approaches (pp. 103–111): https://doi.org/10.3115/v1/w14-4012.
D'Alessandro, G., Zanuy, M. F., & Piazza, F. (2002, May). A new subband non linear prediction coding algorithm for narrowband speech signal: The nADPCMB⊥ MLT coding scheme. In 2002 IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. I-1025). IEEE. https://doi.org/10.1109/icassp.2002.5743969.
Despotovic, V., Görtz, N., & Peric, Z. (2012, September). Low-order volterra long-term predictors. In Speech communication; 10. ITG symposium (pp. 1–4). VDE
Despotović, V., & Perić, Z. (2013, November). Design of nonlinear predictors for adaptive predictive coding of speech signals. In 2013 21st telecommunications forum Telfor (TELFOR) (pp. 490–497). IEEE. https://doi.org/10.1109/TELFOR.2013.6716274.
Despotović, V., Görtz, N., & Perić, Z. (2012). Improved non-linear long-term predictors based on Volterra filters. International Symposium Electronics in Marine, 2, 231–234.
Google Scholar
Faundez-Zanuy, M. (2015) Nonlinear predictive models computation in ADPCM schemes1. In Eurpean signal processing conference (Vol. 2015, pp. 6–9, 2000).
Faúndez-Zanuy, M. (2003). Wide band sub-band speech coding using non-linear prediction. In ICASSP, IEEE international conference on acoustic speech signal processing—Proceedings (Vol. 2, no. 1, pp. 181–184) https://doi.org/10.1109/icassp.2003.1202324.
Faundez-Zanuy, M. (2005). Nonlinear speech processing: Overview and possibilities in speech coding. In Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) (Vol. 3445 LNAI, no. 4, pp. 15–42). https://doi.org/10.1007/11520153_2.
Faúndez-Zanuy, M. (2001). Nonlinear vectorial prediction with neural nets. In Lecture notes in Computer Science (including Subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics), (Vol. 2085 LNCS, no. PART 2, pp. 754–761) https://doi.org/10.1007/3-540-45723-2_91.
Faúndez-Zanuy, M. (2003, June). Non-linear speech coding with MLP, RBF and Elman based prediction1. In International work-conference on artificial neural networks (pp. 671–678). Berlin, Heidelberg. Springer. https://doi.org/10.1007/3-540-44869-1_85.
Faundez-Zanuy, M. (2006). Speech coding through adaptive combined nonlinear prediction. Speech Communication, 48(7), 838–847. https://doi.org/10.1016/j.specom.2005.09.007
Article Google Scholar
Franeese, M. F. (1998). Marcos Fatindez-Zanuy *, pp. 345–348, 1998.
Abou Haidar, G., Achkar, R., & Dourgham, H. (2016, November). A comparative simulation study of the real effect of PCM, DM and DPCM systems on audio and image modulation. In 2016 IEEE international multidisciplinary conference on engineering technology (IMCET) (pp. 144–149). IEEE
Haque, M., & Bhattacharyya, K. (2016). A review on speech filtering and its different techniques. Journal of Engineering Technology, 4(1), 196–200.
Google Scholar
Izumi, T., & Iiguni, Y. (2006). Data compression of nonlinear time series using a hybrid linear/nonlinear predictor. Signal Processing, 86(9), 2439–2446. https://doi.org/10.1016/j.sigpro.2005.11.013
Article Google Scholar
Jagtap, S. K., Mulye, M. S., & Uplane, M. D. (2015). Speech coding techniques. Procedia Computer Science, 49(1), 253–263. https://doi.org/10.1016/j.procs.2015.04.251
Article Google Scholar
Jayasankar, U., Thirumal, V., & Ponnurangam, D. (2021). A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications. Journal of King Saud University-Computer and Information Sciences, 33(2), 119–140. https://doi.org/10.1016/j.jksuci.2018.05.006
Article Google Scholar
Kaladharan, N. (2017). A review of different speech coding methods. International Journal of Electricals and Electronics Engineering Telecommunication, 6(2), 96–103.
Google Scholar
Karpathy, A., Johnson, J., & Fei-Fei, L. (2015). Visualizing and understanding recurrent networks, pp. 1–12. http://arxiv.org/abs/1506.02078.
Keles, H. Y., Rozhon, J., Ilk, H. G., & Voznak, M. (2019). DeepVoCoder: A CNN model for compression and coding of narrow band speech. IEEE Access, 7, 75081–75089.
Article Google Scholar
Kitchenham, B., & Charters, S. M. (2007). Guidelines for performing systematic literature reviews in software engineering, EBSE Technical Report EBSE-2007-01, Software Engineering Group School of Computer Science and Ma.
Kleijn, W. B., Lim, F. S., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., & Walters, T. C. (2018, April). Wavenet based low rate speech coding. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 676–680). IEEE. https://doi.org/10.1109/ICASSP.2018.8462529.
Kofod-Petersen, A. (2012). How to do a structured literature review in computer science. Ver. 0.1. October, 1
Laskov, L., Georgieva, V., & Dimitrov, K. (2020). Analysis of pulse code modulation in MATLAB/octave environment. In 2020 55th international science conference on information, communication energy system technology. (ICEST 2020-Proceeding) (pp. 77–80). https://doi.org/10.1109/ICEST49890.2020.9232755
Li, Z. N., Drew, M. S., Liu, J., Li, Z. N., Drew, M. S., & Liu, J. (2021). Basic audio compression techniques. Fundamentals of Multimedia, 479–504
Ling, Z. H., Ai, Y., Gu, Y., & Dai, L. R. (2018). Waveform modeling and generation using hierarchical recurrent neural networks for speech bandwidth extension. IEEE/ACM Transactions on Audio Speech and Language Processing, 26(5), 883–894. https://doi.org/10.1109/TASLP.2018.2798811
Article Google Scholar
Lotfidereshgi, R., & Gournay, P. (2018, April). Speech prediction using an adaptive recurrent neural network with application to packet loss concealment. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5394–5398). IEEE.
Mansour, C., Achkar, R., & Haidar, G. A. (2012). Simulation of DPCM and ADM systems. In Proceedings—2012 14th international conference modelling and simulation, (UKSim 2012) (no. 4, pp. 416–421). https://doi.org/10.1109/UKSim.2012.64.
Mishra, S. (2016). A survey paper on different data compression techniques Saumya Mishra Shraddha Singh.
Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., & Shaalan, K. (2019). Speech recognition using deep neural networks: A systematic review. IEEE Access, 7, 19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880
Article Google Scholar
S. Nosouhian, Nosouhian, F., & Khoshouei, A. K. (2021). A review of recurrent neural network architecture for sequence learning: Comparison between LSTM and GRU. Preprint, no. July, pp. 1–7, https://doi.org/10.20944/preprints202107.0252.v1.
O’Shaughnessy, D. (2023). Review of methods for coding of speech signals. EURASIP Journal of Audio, Speech, Music Processing, 1, 2023. https://doi.org/10.1186/s13636-023-00274-x
Article Google Scholar
Bäckström, T. (2017). Speech coding with code-excited linear prediction (pp. 37–41). Springer.
Pandey, S., & Banerjee, A. (2022). Optimal non-uniform sampling by branch-and-bound approach for speech coding. IEEE Access, 10, 2797–2812. https://doi.org/10.1109/ACCESS.2021.3138068
Article Google Scholar
Pérez-Ortiz, J. A., Calera-Rubio, J., & Forcada, M. L. (2001, September). A comparison between recurrent neural architectures for real-time nonlinear prediction of speech signals. In Neural networks for signal processing XI: Proceedings of the 2001 IEEE signal processing society workshop (IEEE Cat. No. 01TH8584) (pp. 73–81). IEEE. https://doi.org/10.1109/nnsp.2001.943112.
Polynomial, A., Volterra, V., & Wiener, N. (1958) 10. Adaptive Volterra Filters.
Qu, L., Lyu, J., Li, W., Ma, D., & Fan, H. (2021). Features injected recurrent neural networks for short-term traffic speed prediction. Neurocomputing, 451, 290–304. https://doi.org/10.1016/j.neucom.2021.03.054
Article Google Scholar
Raina, S. B., Raina, R., & Agarwal, V. (2014). Wireless speech coding : A systematic review.
Ray, M., Chandra, M., & Patil, B. P. (2015). Speech coding techniques for VoIP applications: A technical review. World Applied Sciences Journal. https://doi.org/10.5829/idosi.wasj.2015.33.05.148
Article Google Scholar
Riera-Palou, F., Den Brinker, A. C., & Gerrits, A. J. (2004, November). A hybrid parametric-waveform approach to bit stream scalable audio coding. In Conference record of the thirty-eighth asilomar conference on signals, systems and computers, 2004. (Vol. 2, pp. 2250–2254). IEEE. https://doi.org/10.1109/acssc.2004.1399568.
Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-t erm memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. https://doi.org/10.1016/j.physd.2019.132306
Article MathSciNet Google Scholar
Somers, H. (1999). An overview of digital. Structure. https://doi.org/10.1016/B978-0-12-373580-5.50038-7
Article Google Scholar
Stachurski, J., & McCree, A. (2000, September). Combining parametric and waveform-matching coders for low bit-rate speech coding. In 2000 10th European signal processing conference (pp. 1–4). IEEE.
Tanaka, H., & Shimamura, T. (2004, September). Nonlinear predictive analysis of speech by iterative approach. In 2004 12th European signal processing conference (pp. 2055–2058). IEEE
Taware, D., & Handore, S. (2014). Speech compression techniques. 2(12), 1–7.
Townshend, B. (1991). Nonlinear prediction of speech. In Proceedings of ICASSP, IEEE international conference on acoustics speech and signal processing (Vol. 1, pp. 425–428). https://doi.org/10.1109/icassp.1991.150367
USNA. (2021). Lesson 20 : Analog to digital conversion. Ece, no. c, 2021, [Online]. Available: https://www.usna.edu/ECE/ec312/Lessons/wireless/EC312_Lesson_20_Analog_to_Digital_Course_Notes.pdf.
Varoglu, E., & Hacioglu, K. (2000). Recurrent neural network speech predictor based on dynamical systems approach. IEE Proceedings-Vision, Image and Signal Processing, 147(2), 149–156.
Article Google Scholar
Wang, A., Sun, Z., & Zhang, X. (2002, June). A non-linear prediction speech coding system based on ANN. In Proceedings of the 4th world congress on intelligent control and automation (Cat. No. 02EX527) (Vol. 1, pp. 607–611). IEEE
Wang, G. (2006). Stability study of the SB-ADPCM coder. Signal Processing, 86(2), 319–330. https://doi.org/10.1016/j.sigpro.2005.05.011
Article Google Scholar
Yan, W., Zhang, J., Zhang, S., & Wen, P. (2018). A novel pipelined neural IIR adaptive filter for speech prediction. Applied Acoustics, 141, 64–70. https://doi.org/10.1016/j.apacoust.2018.06.007
Article Google Scholar
Yoshimura, T., Hashimoto, K., Oura, K., Nankaku, Y., & Tokuda, K. (2019, May). Speaker-dependent WaveNet-based delay-free ADPCM speech coding. In ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7145–7149). IEEE.
Zacarias-Morales, N., Pancardo, P., Hernández-Nolasco, J. A., & Garcia-Constantino, M. (2021). Attention-inspired artificial neural networks for speech processing: A systematic review. Symmetry (Basel), 13(2), 1–43. https://doi.org/10.3390/sym13020214
Article Google Scholar
Zhang, G. A., Gu, J. Y., Bao, Z. H., Xu, C., & Zhang, S. B. (2014). Joint routing and channel assignment algorithms in cognitive wireless mesh networks. Transactions on Emerging Telecommunications and Technology, 25(3), 294–307. https://doi.org/10.1002/ett
Article Google Scholar
Zhao, Z., Liu, H., & Fingscheidt, T. (2018, September). Nonlinear prediction of speech by echo state networks. In 2018 26th European signal processing conference (EUSIPCO) (pp. 2085–2089). IEEE. https://doi.org/10.23919/EUSIPCO.2018.8553190.
Zhao, H., & Zhang, J. (2009). Pipelined Chebyshev functional link artificial recurrent neural network for nonlinear adaptive filter. IEEE Transactions on Systems, Man, and Cybernetics, Part B Cybernetics, 40(1), 162–172. https://doi.org/10.1109/TSMCB.2009.2024313
Article Google Scholar
Zhen, K., et al. (2022). Scalable and efficient neural speech coding: A hybrid design. IEEE/ACM Transactions on Audio Speech and Language Processing, 30, 12–25. https://doi.org/10.1109/TASLP.2021.3129353
Article Google Scholar
Zhen, K., Sung, J., Lee, M. S., Beack, S., & Kim, M. (2021). Scalable and efficient neural speech coding: A hybrid design. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 12–25.
Article Google Scholar

Download references

Acknowledgements

I would like to express my gratitude to the German Academic Exchange Service (DAAD) for providing funding for my PhD studies, including support for tuition fees, research expenses, and a stipend for living expenses. I am also thankful to Jomo Kenyatta University of Agriculture and Technology (JKUAT) for hosting me as a PhD student and providing invaluable academic resources and support.

Funding

This research was supported by a scholarship from DAAD, which provided funding for tuition fees, research expenses, and a stipend for the author during my PhD studies at Jomo Kenyatta University of Agriculture and Technology (JKUAT), Kenya.

Author information

Authors and Affiliations

School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Juja, Nairobi, 62000, Nairobi, Kenya
Gebremichael Kibret Sheferaw, Waweru Mwangi & Michael Kimwele
School of Information Technology and Engineering, Addis Ababa University Institute of Technology, 5Kilo, Addis-Ababa, Addis-Ababa, Ethiopia
Adane Mamuye

Authors

Gebremichael Kibret Sheferaw
View author publications
You can also search for this author in PubMed Google Scholar
Waweru Mwangi
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kimwele
View author publications
You can also search for this author in PubMed Google Scholar
Adane Mamuye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gebremichael Kibret Sheferaw.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Table 9 and 10.

Table 9 List of publication and source information of selected papers

Full size table

Table 10 The studies of information extraction the studies of information extraction related to the research questions

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sheferaw, G.K., Mwangi, W., Kimwele, M. et al. Waveform based speech coding using nonlinear predictive techniques: a systematic review. Int J Speech Technol 26, 1031–1059 (2023). https://doi.org/10.1007/s10772-023-10072-7

Download citation

Received: 28 June 2023
Accepted: 17 November 2023
Published: 21 December 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10772-023-10072-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Waveform based speech coding using nonlinear predictive techniques: a systematic review

Abstract

Access this article

Similar content being viewed by others

Speech coding techniques and challenges: a comprehensive literature survey

Review of methods for coding of speech signals

Automatic Speech Recognition Based on Neural Networks

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Waveform based speech coding using nonlinear predictive techniques: a systematic review

Abstract

Access this article

Similar content being viewed by others

Speech coding techniques and challenges: a comprehensive literature survey

Review of methods for coding of speech signals

Automatic Speech Recognition Based on Neural Networks

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation