Recurrent Neural Network-Based Dictionary Learning for Compressive Speech Sensing

  • Yunyun JiEmail author
  • Wei-Ping Zhu
  • Benoit Champagne


We propose a novel dictionary learning technique for compressive sensing of speech signals based on the recurrent neural network. First, we exploit the recurrent neural network to solve an \(\ell _{0}\)-norm optimization problem based on a sequential linear prediction model for estimating the linear prediction coefficients for voiced and unvoiced speech, respectively. Then, the extracted linear prediction coefficient vectors are clustered through an improved Linde–Buzo–Gray algorithm to generate codebooks for voiced and unvoiced speech, respectively. A dictionary is then constructed for each type of speech by concatenating a union of structured matrices derived from the column vectors in the corresponding codebook. Next, a decision module is designed to determine the appropriate dictionary for the recovery algorithm in the compressive sensing system. Finally, based on the sequential linear prediction model and the proposed dictionary, a sequential recovery algorithm is proposed to further improve the quality of the reconstructed speech. Experimental results show that when compared to the selected state-of-the-art approaches, our proposed method can achieve superior performance in terms of several objective measures including segmental signal-to-noise ratio, perceptual evaluation of speech quality and short-time objective intelligibility under both noise-free and noise-aware conditions.


Recurrent neural network Linear prediction coefficient Clustering Sequential recovery algorithm Compressive sensing 



This work is supported in part by the Natural Sciences and Engineering Research Council of Canada, the National Natural Science Foundation of China (Grant Nos. 61601248, 61771263, 61871241) and the University Natural Science Research Foundation of Jiangsu Province, China (Grant No. 16KJB510037).


  1. 1.
    C.C. Aggarwal, C.K. Reddy, Data Clustering: Algorithms and Applications (CRC Press, New York, 2013), pp. 60–65Google Scholar
  2. 2.
    M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)zbMATHGoogle Scholar
  3. 3.
    N.S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)MathSciNetGoogle Scholar
  4. 4.
    C.L. Bao, H. Ji, Y.H. Quan, Z.W. Shen, Dictionary learning for sparse coding: algorithms and convergence analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1356–1369 (2016)Google Scholar
  5. 5.
    E.J. Candes, M.B. Wakin, An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 20–21 (2008)Google Scholar
  6. 6.
    E.J. Candes, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)MathSciNetzbMATHGoogle Scholar
  7. 7.
    E.J. Candes, T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)MathSciNetzbMATHGoogle Scholar
  8. 8.
    E.J. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)MathSciNetzbMATHGoogle Scholar
  9. 9.
    S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)MathSciNetzbMATHGoogle Scholar
  10. 10.
    D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Y.C. Eldar, G. Kutyniok, Compressed Sensing: Theory and Applications (Cambridge University Press, New York, 2012), pp. 20–25Google Scholar
  12. 12.
    K. Engan, S.O. Aase, J.H. Husoy, Multi-frame compression: theory and design. Signal Process. 80(10), 2121–2140 (2000)zbMATHGoogle Scholar
  13. 13.
    S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive Sensing (Birkhauser, New York, 2013), pp. 40–50zbMATHGoogle Scholar
  14. 14.
    D. Giacobello, M.G. Christensen, M.N. Murthi, S.H. Jensen, M. Moonen, Sparse linear prediction and its applications to speech processing. IEEE Trans. Audio Speech Lang. Process. 20(5), 1610–1644 (2012)Google Scholar
  15. 15.
    R. Gribonval, M. Nielsen, Sparse representations in unions of bases. IEEE Trans. Inf. Theory 49(12), 3320–3325 (2003)MathSciNetzbMATHGoogle Scholar
  16. 16.
    A. Hosseini, J. Wang, S.M. Hosseini, A recurrent neural network for solving a class of generalized convex optimization problems. Neural Netw. 44, 78–86 (2013)zbMATHGoogle Scholar
  17. 17.
    X.L. Hu, J. Wang, A recurrent neural network for solving a class of general variational inequalities. IEEE Trans. Syst. Man Cybern. B (Cybern.) 37(3), 528–539 (2007)Google Scholar
  18. 18.
    Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)Google Scholar
  19. 19.
    J.N. Laska, P.T. Boufounos, M.A. Davenport, R.G. Baraniuk, Democracy in action: quantization, saturation, and compressive sensing. Appl. Comput. Harmon. Anal. 31(3), 429–443 (2011)MathSciNetzbMATHGoogle Scholar
  20. 20.
    S.H. Liu, Y.D. Zhang, T. Shan, R. Tao, Structure-aware Bayesian compressive sensing for frequency-hopping spectrum estimation with missing observations. IEEE Trans. Signal Process. 66(8), 2153–2166 (2018)MathSciNetGoogle Scholar
  21. 21.
    J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11(1), 19–60 (2010)MathSciNetzbMATHGoogle Scholar
  22. 22.
    D. Needle, J.A. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)MathSciNetzbMATHGoogle Scholar
  23. 23.
    R. Rubinstein, M. Zibulevsky, M. Elad, Double sparsity: learning sparse dictionaries for sparse signal approximation. IEEE Trans. Signal Process. 58(3), 1553–1564 (2010)MathSciNetzbMATHGoogle Scholar
  24. 24.
    S.J. Sengijpta, Fundamentals of Statistical Signal Processing: Estimation Theory (Taylor and Francis Group, Abingdon, 1995), pp. 100–105Google Scholar
  25. 25.
    P. Sharma, V. Abrol, A.D. Dileep, A.K. Sao, Sparse coding based features for speech units classification. Comput. Speech Lang. 47, 333–350 (2018)Google Scholar
  26. 26.
    C.D. Sigg, T. Dikk, J.M. Buhmann, Speech enhancement using generative dictionary learning. IEEE Trans. Audio Speech Lang. Process. 20(6), 1698–1712 (2012)Google Scholar
  27. 27.
    P. Stoica, R.L. Moses, Spectral Analysis of Signals (Pearson Prentice Hall, Upper Saddle River, 2005), pp. 80–90Google Scholar
  28. 28.
    L.H. Sun, Z. Yang, Y.Y. Ji, L. Ye, Reconstruction of compressed speech sensing based on overcomplete linear prediction dictionary. Chin. J. Sci. Instrum. 4, 733–739 (2012)Google Scholar
  29. 29.
    C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)Google Scholar
  30. 30.
    D. Tank, J. Hopfield, Simple neural optimization networks: An A/D converter, signal decision circuit, and a linear programming circuit. IEEE Trans. Circuits Syst. 33(5), 533–541 (1986)Google Scholar
  31. 31.
    I. Tosic, P. Frossard, Dictionary learning. IEEE Signal Process. Mag. 28(2), 27–38 (2011)zbMATHGoogle Scholar
  32. 32.
    J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007)MathSciNetzbMATHGoogle Scholar
  33. 33.
    T.H. Vu, V. Monga, Fast low-rank shared dictionary learning for image classification. IEEE Trans. Image Process. 26(11), 5160–5175 (2017)MathSciNetGoogle Scholar
  34. 34.
    J.C. Wang, Y.S. Lee, C.H. Lin, S.F. Wang, C.H. Shih, C.H. Wu, Compressive sensing-based speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 24(11), 2122–2131 (2016)Google Scholar
  35. 35.
    D.L. Wu, W.P. Zhu, M. Swamy, The theory of compressive sensing matching pursuit considering time-domain noise with application to speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 682–696 (2014)Google Scholar
  36. 36.
    Y.S. Xia, M.S. Kamel, A generalized least absolute deviation method for parameter estimation of autoregressive signals. IEEE Trans. Neural Netw. 19(1), 107–118 (2008)Google Scholar
  37. 37.
    Y.S. Xia, M.S. Kamel, H. Leung, A fast algorithm for AR parameter estimation using a novel noise-constrained least-squares method. Neural Netw. 23(3), 396–405 (2010)Google Scholar
  38. 38.
    Y.S. Xia, J. Wang, Low-dimensional recurrent neural network-based Kalman filter for speech enhancement. Neural Netw. 67, 131–139 (2015)Google Scholar
  39. 39.
    Z. Zhang, Y. Xu, J. Yang, X.L. Li, D. Zhang, A survey of sparse representation: algorithms and applications. IEEE Access 3, 490–500 (2015)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Electronics and InformationNantong UniversityNantongChina
  2. 2.Department of Electrical and Computer EngineeringConcordia UniversityMontrealCanada
  3. 3.Department of Electrical and Computer EngineeringMcGill UniversityMontrealCanada

Personalised recommendations