Skip to main content
Log in

Substitution of G.728 vocoder’s codebook search module with SOM array trained by PSO-optimized supervised algorithm

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Low delay-code excited linear prediction (LD-CELP) is an attractive algorithm in implementing vocoders in voice over Internet protocol networks. This algorithm has been proposed for the coding of speech at 16 kbps with toll quality. However, operation at transmission rates lower than 16 kbps is desirable, so that traffic can be accommodated during system overload conditions. In this paper, an array of self-organizing maps (SOMs) is employed instead of traditional codebook search module, recommended in ITU-T G.728, to determine the optimum index value of shape codebook. It is noted that a modified supervised training algorithm is used for SOMs in which some of the training parameters are optimized using particle swarm optimization (PSO) algorithm. Based on the occurrence frequency characteristics of codevectors, six bits for shape codebook and two bits for gain codebook are used in this work to produce a vocoder with lower bit rate as compared with traditional ITU-T G.728 vocoder. The performance comparison of the proposed SOM array trained by PSO-optimized supervised algorithm as the codebook search module in the structure of LD-CELP with a conventional implementation of LD-CELP coder shows that execution time of the algorithm is reduced up to 44 %. However, the degradation of voice quality in terms of mean opinion score, perceived evaluation of speech quality and segmental signal-to-noise ratio (SNRseg) is acceptable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Roychoudhuri L, Al-Saher E, Brewster GB (2006) On the impact of loss and delay variation on Internet packet audio transmission. Comput Commun 29:1578–1589

    Article  Google Scholar 

  2. Chen JH, Cox RV, Lin YC, Jayant N (1992) A low delay CELP coder for CCITT 16 kb/s speech coding standard. IEEE J Sel Areas Commmun 10:830–847

    Article  Google Scholar 

  3. International Telephone and Telegraph Consultative Committee (1992) Recommendation G.728: coding of speech at 16 kbit/s using low-delay code excited linear prediction, Geneva. Available on http://www.itu.int/rec/T-REC-G.728/en

  4. Knyva V, Savickas M (2002) Increasing of speech compression degree of LD-CELP algorithm. J Electron Electr Eng 39:13–16

    Google Scholar 

  5. Sheikhan M, Tebyani M, Lotfizad M (1997) Continuous speech recognition and syntactic processing in Iranian Farsi language. Int J Speech Technol 1:135–141

    Article  Google Scholar 

  6. Sheikhan M (2003) Suboptimum extracted features and classifier for speaker-independent Farsi digit recognizer. In: The proceedings of the international symposium on telecommunications, pp 246–249

  7. Sheikhan M, Gharavian D, Ashoftedel F (2011) Using DTW-neural based MFCC warping to improve emotional speech recognition. Neural Comput Appl (article in press). Available online 14 May 2011. doi:10.1007/s00521-011-0620-8

  8. Sheikhan M (2003) Prosody generation in Farsi language. In: The proceedings of the international symposium on telecommunications, pp 250–253

  9. Sheikhan M, Nasirzadeh M, Daftarian A (2006) Text to speech for Iraninan dialect of Farsi language. In: The proceedings of the second workshop on Farsi computer speech, pp 39–53

  10. Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl (article in press). Available online 27 May 2011. doi:10.1007/s00521-011-0643-1

  11. Sheikhan M, Bejani M, Gharavian D (2012) Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Comput Appl (article in press). Available online 20 Jan 2012. doi:10.1007/s00521-012-0814-8

  12. Gharavian D, Sheikhan M, Ashoftedel F (2012) Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model. Neural Comput Appl (article in press). Available online 15 Feb 2012. doi:10.1007/s00521-012-0884-7

  13. Sheikhan M, Tebyani M, Lotfizad M (1996) Using symbolic and connectionist approaches to automate editing Persian sentences syntacticly. In: The proceedings of the international conference on intelligent and cognitive systems, pp 250–253

  14. Birgmeier M (1996) Nonlinear prediction of speech signals using radial basis function networks. In: The proceedings of the European signal processing conference, vol 1, pp 459–462

  15. Faundez M (1999) Adaptive hybrid speech coding with a MLP/LPC structure. In: The proceedings of the international work-conference on artificial and natural neural networks, vol 11, pp 814–823

  16. Sassi SB, Braham R, Belghith A (2001) Neural speech synthesis system for Arabic language using CELP algorithm. In: The proceedings of the ACS/IEEE international conference on computer systems and applications, pp 119–121

  17. Faúndez-Zanuy M (2003) Nonlinear speech coding with MLP, RBF and Elman based prediction. Lecture Notes Comput Sci 2687:671–678

    Article  Google Scholar 

  18. Easton MG, Goodyear CC (1991) A CELP codebook and search technique using a Hopfield net. In: The proceedings of the IEEE international conference on acoustics, speech and signal processing, pp 685–688

  19. Indrayanto A, Langi A, Kinsner W (1991) A neural network mapper for stochastic codebook parameter encoding in code excited linear predictive speech processing. In: The proceedings of the IEEE western Canada conference on computer, power and communication systems in a rural environment, pp 221–224

  20. Hernandez-Gomez LA, Lopez-Gonzalo E (1993) Phonetically-driven CELP coding using self-organizing maps. In: The proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 2, pp 628–631

  21. Wu S, Zhang G, Zhang X, Zhao Q (2008) A LD-aCELP speech coding algorithm based on modified SOFM vector quantizer. In: The proceedings of the international symposium on intelligent information technology application, pp 408–411

  22. Huong V, Min BJ, Park DC, Woo DM (2008) A new vocoder based on AMR 7.4 kbit/s mode in speaker dependent coding system. In: The proceedings of the ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing, pp 163–167

  23. Zhang G, Xie K, Zhao Z, Xue C (2006) The LD-CELP gain filter based on BP NN. Lecture Notes Comput Sci 3973:150–155

    Article  Google Scholar 

  24. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69

    Article  MathSciNet  MATH  Google Scholar 

  25. Tokunaga K, Furukawa T (2009) Modular network SOM. Neural Netw 22:82–90

    Article  Google Scholar 

  26. Ghouila A, Ben Yahia S, Malouche D, Jmel H, Laouini D, Guerfali FZ, Abdelhak S (2009) Application of multi-SOM clustering approach to macrophage gene expression analysis. Infect Genet Evol 9:328–336

    Article  Google Scholar 

  27. Furukawa T (2009) SOM of SOMs. Neural Netw 22:463–478

    Article  Google Scholar 

  28. Zhang J, Dai D (2009) An adaptive spatial clustering method for automatic brain MR image segmentation. Prog Nat Sci 19:1373–1382

    Article  MATH  Google Scholar 

  29. Xu L, Xu Y, Chow TWS (2010) PolSOM: a new method for multidimensional data visualization. Pattern Recognit 43:1668–1675

    Article  MATH  Google Scholar 

  30. Xu L, Xu Y, Chow TWS (2011) PPoSOM: a new variant of PolSOM by using probabilistic assignment for multidimensional data visualization. Neurocomputing 74:2018–2027

    Article  Google Scholar 

  31. Kamimura R (2011) Supposed maximum information for comprehensible representations in SOM. Neurocomputing 74:1116–1134

    Article  Google Scholar 

  32. Kamimura R (2011) Relative information maximization and its application to the extraction of explicit class structure in SOM. Neurocomputing 82:37–51

    Article  Google Scholar 

  33. Jiang X, Liu K, Yan J, Chen W (2012) Application of improved SOM neural network in anomaly detection. Phys Procedia 33:1093–1099

    Article  Google Scholar 

  34. Sirisin S, Jonburom W, Rattanakorn N, Pornsuwancharoen N (2012) A new technique gray scale display of input data using shooting SOM and genetic algorithm. Procedia Eng 32:556–563

    Article  Google Scholar 

  35. Tai W-S, Hsu C–C (2012) Growing self-organizing map with cross insert for mixed-type data clustering. Appl Soft Comput 12:2856–2866

    Article  Google Scholar 

  36. D’Urso P, De Giovanni L (2008) Temporal self-organizing maps for telecommunications market segmentation. Neurocomputing 71:2880–2892

    Article  Google Scholar 

  37. Mo J, Kiang MY, Zou P, Li Y (2010) A two-stage clustering approach for multi-region segmentation. Expert Syst Appl 37:7120–7131

    Article  Google Scholar 

  38. Hadavandi E, Shavandi H, Ghanbari A (2010) Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowl Based Syst 23:800–808

    Article  Google Scholar 

  39. Ghaseminezhad MH, Karami A (2011) A novel self-organizing map (SOM) neural network for discrete groups of data clustering. Appl Soft Comput 11:3771–3778

    Article  Google Scholar 

  40. Liu Y-C, Wu C, Liu M (2011) Research of fast SOM clustering for text information. Expert Syst Appl 38:9325–9333

    Article  Google Scholar 

  41. Sadeghi F, Izadinia H, Safabakhsh R (2011) A new active contour model based on the conscience, archiving and mean-movement mechanisms and the SOM. Pattern Recognit Lett 32:1622–1634

    Article  Google Scholar 

  42. Rasti J, Monadjemi A, Vafaei A (2011) Color reduction using a multi-stage Kohonen self-organizing map with redundant features. Expert Syst Appl 38:13188–13197

    Article  Google Scholar 

  43. Yu Z, Wong H-S, You J, Han G (2012) Visual query processing for efficient image retrieval using a SOM-based filter-refinement scheme. Inf Sci 203:83–101

    Article  Google Scholar 

  44. Chattopadhyay M, Dan PK, Mazumdar S (2012) Application of visual clustering properties of self organizing map in machine-part cell formation. Appl Soft Comput 12:600–610

    Article  Google Scholar 

  45. Gorricha J, Lobo V (2012) Improvements on the visualization of clusters in geo-referenced data using self-organizing maps. Comput Geosci 43:177–186

    Article  Google Scholar 

  46. Sánchez-Lasheras F, de Andrés J, Lorca P, de Cos Juez FJ (2012) A hybrid device for the solution of sampling bias problems in the forecasting of firms’ bankruptcy. Expert Syst Appl 39:7512–7523

    Article  Google Scholar 

  47. di Tollo G, Tanev S, Davide DM, Ma Z (2012) Neural networks to model the innovativeness perception of co-creative firms. Expert Syst Appl 39:12719–12726

    Article  Google Scholar 

  48. Pham HV, Cooper EW, Cao T, Kamei K (2011) Hybrid Kansei-SOM model using risk management and company assessment for stock trading. Inf Sci (article in press). Available online 6 Dec 2011. doi:10.1016/j.ins.2011.11.036

  49. Liao W-C, Hsu C–C (2012) A self-organizing map for transactional data and the related categorical domain. Appl Soft Comput 12:3141–3157

    Article  Google Scholar 

  50. Tisan A, Cirstea M (2012) SOM neural network design - a new Simulink library based approach targeting FPGA implementation. Math Comput Simul (article in press). Available online 6 June 2012. doi:10.1016/j.matcom.2012.05.006

  51. Gao M, Hong X, Chen S, Harris CJ (2011) A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems. Neurocomputing 74:3456–3466

    Article  Google Scholar 

  52. Zhao L, Qian F (2011) Tuning the structure and parameters of a neural network using cooperative binary-real particle swarm optimization. Expert Syst Appl 38:4972–4977

    Article  Google Scholar 

  53. Leung SYS, Tang Y, Wong WK (2012) A hybrid particle swarm optimization and its application in neural networks. Expert Syst Appl 39:395–405

    Article  Google Scholar 

  54. Nabavi-Kerizi SH, Abadi M, Kabir E (2010) A PSO-based weighting method for linear combination of neural networks. Comput Electr Eng 36:886–894

    Article  MATH  Google Scholar 

  55. Zhang JR, Zhang J, Lok TM, Lyu MR (2007) A hybrid particle swarm optimization-back propagation algorithm for feedforward neural network training. Appl Math Comput 185:1026–1037

    Article  MATH  Google Scholar 

  56. Yu J, Wang S, Xi L (2008) Evolving artificial neural networks using an improved PSO and DPSO. Neurocomputing 71:1054–1060

    Article  Google Scholar 

  57. Lee CM, Ko CN (2009) Time series prediction using RBF neural networks with a nonlinear time-varying evolution PSO algorithm. Neurocomputing 73:449–460

    Article  Google Scholar 

  58. Khayat O, Ebadzadeh MM, Shahdoosti HR, Rajaei R, Khajehnasiri I (2009) A novel hybrid algorithm for creating self-organizing fuzzy neural networks. Neurocomputing 73:517–524

    Article  Google Scholar 

  59. Luitel B, Venayagamoorthy GK (2010) Quantum inspired PSO for the optimization of simultaneous recurrent neural networks as MIMO learning systems. Neural Networks 23:583–586

    Article  Google Scholar 

  60. Subrahmanya N, Shin YC (2010) Constructive training of recurrent neural networks using hybrid optimization. Neurocomputing 73:2624–2631

    Article  Google Scholar 

  61. Li J, Liu X (2011) Melt index prediction by RBF neural network optimized with an MPSO-SA hybrid algorithm. Neurocomputing 74:735–740

    Article  Google Scholar 

  62. Yaghini M, Khoshraftar MM, Fallahi M (2012) A hybrid algorithm for artificial neural network training. Eng Appl Artif Intell (article in press). Available online 23 Mar 2012. doi:10.1016/j.engappai.2012.01.023

  63. Cavuslu MA, Karakuzu C, Karakaya F (2012) Neural identification of dynamic systems on FPGA with improved PSO learning. Appl Soft Comput 12:2707–2718

    Article  Google Scholar 

  64. Green II RC, Wang L, Alam M (2012) Training neural networks using central force optimization and particle swarm optimization: insights and comparisons. Expert Syst Appl 39:555–563

    Google Scholar 

  65. de Mingo López LF, Blas NG, Arteta A (2012) The optimal combination: grammatical swarm, particle swarm optimization and neural networks. J Comput Sci 3:46–55

    Article  Google Scholar 

  66. Dehuri S, Roy R, Cho S-B, Ghosh A (2012) An improved swarm optimized functional link artificial neural network (ISO-FLANN) for classification. J Syst Softw 85:1333–1345

    Article  Google Scholar 

  67. Sheikhan M, Mohammadi N (2012) Time series prediction using PSO-optimized neural network and hybrid feature selection algorithm for IEEE load data. Neural Comput Appl (article in press). Available online 7 June 2012. doi:10.1007/s00521-012-0980-8

  68. Sheikhan M, Hemmati E (2012) PSO-optimized Hopfield neural network-based multipath routing for mobile ad-hoc networks. Int J Comput Intell Syst 5:568–581

    Article  Google Scholar 

  69. Xiao Y, Feng L (2012) A novel neural-network approach of analog fault diagnosis based on kernel discriminant analysis and particle swarm optimization. Appl Soft Comput 12:904–920

    Article  Google Scholar 

  70. Sheikhan M, Sha’bani AA (2012) PSO-optimized modular neural network trained by OWO-HWO algorithm for fault location in analog circuits. Neural Comput Appl (article in press). Available online 25 Apr 2012. doi:10.1007/s00521-012-0947-9

  71. Sheikhan M, Pardis R, Gharavian D (2012) State of charge neural computational models for high energy density batteries in electric vehicles. Neural Comput Appl (article in press). Available online 17 Feb 2012. doi:10.1007/s00521-012-0883-8

  72. Sheikhan M, Shahnazi R, Hemmati E (2012) Adaptive active queue management controller for TCP communication networks using PSO-RBF models. Neural Comput Appl (article in press). Available online 4 Jan 2012. doi:10.1007/s00521-011-0786-0

  73. Sheikhan M, Shahnazi R, Garoucy S (2011) Hyperchaos synchronization using PSO-optimized RBF-based controllers to improve security of communication systems. Neural Comput Appl (article in press) Available online 16 Dec 2011. doi:10.1007/s00521-011-0774-4

  74. Sheikhan M, Pezhmanpour M, Moin MS (2011) Improved contourlet-based steganalysis using binary particle swarm optimization and radial basis neural networks. Neural Comput Appl (article in press). Available online 19 Aug 2011. doi:10.1007/s00521-011-0729-9

  75. Poultangari I, Shahnazi R, Sheikhan M (2012) RBF neural network based PI pitch controller for a class of 5-MW wind turbines using particle swarm optimization algorithm. ISA Trans (article in press). Available online 28 Jun 2012. doi:10.1016/j.isatra.2012.06.001

  76. Vasumathi B, Moorthi S (2012) Implementation of hybrid ANN-PSO algorithm on FPGA for harmonic estimation. Eng Appl Artif Intell 25:476–483

    Article  Google Scholar 

  77. Telecommunication Standardization Sector of ITU (1999) Recommendation G.728—Annex H: variable bit rate LD-CELP operation mainly for DCME at rates less than 16 kbit/s, Geneva. Available on http://www.itu.int/rec/T-REC-G.728-199905-S!AnnH/en

  78. Sheikhan M, Tabataba Vakili V, Garoucy S (2009) Complexity reduction of LD-CELP speech coding in prediction of gain using neural networks. World Appl Sci J 7(Special Issue of Computer & IT):38–44

  79. Sheikhan M, Garoucy S (2010) Reducing the codebook search time in G.728 speech coder using fuzzy ARTMAP neural networks. World Appl Sci J 8:1260–1266

    Google Scholar 

  80. Sheikhan M, Tabataba Vakili V, Garoucy S (2009) Codebook search in LD-CELP speech coding algorithm based on multi-SOM structure. World Appl Sci J 7(Special Issue of Computer & IT):59–68

  81. Sheikhan M, Garoucy S (2011) Computational complexity reduction of AMR-WB speech coding algorithm using new GA-optimized fast codebook search techniques. World Appl Sci J 14:63–70

    Google Scholar 

  82. Telecommunication Standardization Sector of ITU (2002) Recommendation G.722.2: wideband coding of speech at around 16kbit/s using adaptive multi-rate wideband (AMR-WB), Geneva. Available on http://www.itu.int/rec/T-REC-G.722.2/en

  83. Sheikhan M, Garoucy S, Ghoreishi SA (2011) An efficient codebook search method for speech coders optimized by evolutionary and swarm-based techniques. Sci Acad Trans Comput Commun Netw 1:60–67

    Google Scholar 

  84. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: The proceedings of the IEEE international conference on neural networks, vol 4, pp 1942–1948

  85. Shi Y, Eberhart R (1998) Parameter selection in particle swarm optimization. In: The proceedings of the international conference on evolutionary programming, pp 591–601

  86. Telecommunication Standardization Sector of ITU (1996) Recommendation P.800: methods for subjective determination of transmission quality, Geneva. Available on http://www.itu.int/rec/T-REC-P.800-199608-I/en

  87. Al-Akhras M, Zedan H, John R, Al-Momani I (2009) Non-intrusive speech quality prediction in VoIP networks using a neural network approach. Neurocomputing 72:2595–2608

    Article  Google Scholar 

  88. Telecommunication Standardization Sector of ITU (2001) Recommendation P.802: perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, Geneva. Available on http://www.itu.int/rec/T-REC-P.862/

  89. Xueying Z, Qunqun Z, Zhaoyang M (2008) Reducing the complexity of LD-CELP speech coding algorithm using direct vector quantization. In: The proceedings of the international conference on communications, circuits and systems, pp 811–815

  90. Zahir Azami SB, Feng G (2000) Robust vector quantizer using self-organizing neural networks. Signal Process 80:1289–1298

    Article  MATH  Google Scholar 

  91. Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, Berlin

    Book  MATH  Google Scholar 

  92. Hagenbuchner M, Tsoi A, Sperduti A (2001) A supervised self-organizing map for structured data. In: Allinson N, Yin H, Allinson L, Slack J (eds) Advances in self-organizing maps. Springer, Heidelberg, pp 21–28

  93. Hagenbuchner M, Sperduti A, Tsoi A (2003) A self-organizing map for adaptive processing of structured data. IEEE Trans Neural Netw 14:491–505

    Article  Google Scholar 

  94. Kawano N, Yajima H, Hotta A, Naito Y (1995) A variable bit-rate LD-CELP speech coder at 16, 12.8 and 9.6 kbit/s. In: The proceedings of the IEEE workshop on speech coding for telecommunications, pp 95–96

  95. Linde Y, Buzo A, Gray RM (1980) An algorithm for vector quantizer design. IEEE Trans Commun 28:84–95

    Article  Google Scholar 

  96. Deller JR, Hansen JHL, Proakis JG (2000) Discrete-time processing of speech signals, 2nd edn. IEEE Press, New York

  97. Uriarte EA, Martin FD (2006) Topology preservation in SOM. World Acad Sci Eng Technol 21:52–55

    Google Scholar 

  98. Max J (1960) Quantizing for minimum distortion. IRE Trans Inf Theory 6:7–12

    Article  MathSciNet  Google Scholar 

  99. Paez MD, Glisson TH (1972) Minimum mean squared-error quantization in speech. IEEE Trans Commun 20:225–230

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mansour Sheikhan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sheikhan, M., Garoucy, S. Substitution of G.728 vocoder’s codebook search module with SOM array trained by PSO-optimized supervised algorithm. Neural Comput & Applic 23, 2309–2321 (2013). https://doi.org/10.1007/s00521-012-1183-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-1183-z

Keywords

Navigation