Skip to main content

Sparse Representations for Speech Recognition

  • Chapter
  • First Online:
Compressed Sensing & Sparse Filtering

Abstract

This chapter presents the methods that are currently exploited for sparse optimization in speech. It also demonstrates how sparse representations can be constructed for classification and recognition tasks, and gives an overview of recent results that were obtained with sparse representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that the Gaussian means we refer to in this work are built from the original training data, not the projected \(H\beta \) features.

  2. 2.

    Using SRs to compute accuracy is described in [14].

  3. 3.

    We have not included the accuracy of the HMM since this takes into account sequence information which both the GMM and SR methods do not.

References

  1. Deselaers T, Heigold G, Ney H (2007) Speech recognition with state-based nearest neighbour classifiers. In: Proceedings of the interspeech.

    Google Scholar 

  2. Gemmeke JF, Virtanen T (2010) Noise robust exemplar-based connected digit recognition. In: Proceedings of the ICASSP.

    Google Scholar 

  3. Sainath TN, Carmi A, Kanevsky D, Ramabhadran B (2010) Bayesian compressive sensing for phonetic classification. In: Proceedings of the ICASSP.

    Google Scholar 

  4. De Wachter M, Demuynck K, Van Compernolle D, Wambacq P (2003) Data driven example based continuous speech recognition. In: Proceedings of the european conference on speech communication and technology.

    Google Scholar 

  5. Tychonoff A, Arseny V (1977) Solution of ill-posed problems. Winston and Sons, Washington

    Google Scholar 

  6. Wright J, Yang A, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31: 210–227

    Google Scholar 

  7. Carmi A, Gurfil P, Kanevsky D, Ramabhadran B (2009) ABCS: approximate bayesian compressive sensing. Technical Report Human Language Technologies, IBM

    Google Scholar 

  8. Sainath TN, Nahamoo D, Kanevsky D, Ramabhadrans B, Shah PM (2011) A convex hull approach to sparse representations for exemplar-based speech recognition. In: Proceedings of the ASRU.

    Google Scholar 

  9. Sainath T, Ramabhadran B, Olsen P, Kanevsky D, Nahamoo D (2011) A-Functions: a generalization of extended baum-welch transformations to convex optimization. In: Proceedings of the ICASSP.

    Google Scholar 

  10. Kanevsky D, Sainath TN, Ramabhadran B, Nahamoo D (2010) An analysis of sparseness and regularization in exemplar-based methods for speech classification. In: Proceedings of the interspeech.

    Google Scholar 

  11. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol.) 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  12. Ji S, Xue Y, Carin L (2008) Bayesian compressive sensing. IEEE Trans Signal Process 56:2346–2356

    Article  MathSciNet  Google Scholar 

  13. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Statist Soc B 67:301–320

    Article  MathSciNet  MATH  Google Scholar 

  14. Sainath TN, Ramabhadran B, Nahamoo D, Kanevsky D, Sethy A (2010) Exemplar-based sparse representation features for speech recognition. In: Proceedings of the interspeech.

    Google Scholar 

  15. Sainath TN, Nahamoo D, Ramabhadran B, Kanevsky D, Goel V, Shah PM (2011) Exemplar-based sparse representation phone identification features. In: Proceedings of the ICASSP.

    Google Scholar 

  16. Lamel L, Kassel R, Seneff S (1986) Speech database development: design and analysis of the acoustic-phonetic corpus. In: Proceedings of the DARPA speech recognition, workshop.

    Google Scholar 

  17. Kingsbury B (2009) Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling In: Proceedings of the ICASSP.

    Google Scholar 

  18. De Wachter M, Matton M, Demuynck K, Wambacq P, Cools R, Van Compernolle D (2007) Template based continuous speech recognition. IEEE Trans Audio Speech Lang Process 15(4):1377–1390

    Google Scholar 

  19. Sainath TN, Ramabhadran B, Nahamoo D, Kanevsky D, Sethy A (2012) Enhancing exemplar-based posteriors for speech recognition tasks. In: Proceedings of the interspeech.

    Google Scholar 

  20. Bellegarda J, Nahamoo D (1990) Tied mixture continuous parameter modeling for speech recognition. IEEE Trans Acous Speech Signal Process 38(12):2033–2045

    Google Scholar 

  21. Sainath TN, Ramabhadran B, Picheny M, Nahamoo D, Kanevsky D (2011) Exemplar-based sparse representation features: From TIMIT to LVCSR. IEEE Trans Acous Speech and Signal Process 19(8):2598–2613

    Google Scholar 

  22. Candes EJ, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52:489–509

    Article  MathSciNet  MATH  Google Scholar 

  23. Candes EJ (2006) Compressive sampling. Proceedings of the international congress of mathematicians, European Mathematical Society, Madrid, Spain

    Google Scholar 

  24. Gopalakrishnan PS, Kanevsky D, Nahamoo D, Nadas A (1991) An inequality for rational functions with applications to some statistical estimation problems. IEEE Trans. Information Theory 37(1): 107–113

    Google Scholar 

  25. Povey D (2003) Discriminative training for large vocabulary speech recognition. Ph.D. thesis, Cambridge University.

    Google Scholar 

  26. Sainath T, Ramabhadran B, Olsen P, Kanevsky D, Nahamoo D (2011) Convergence of line search a-function methods. In: Proceedings of the interspeech.

    Google Scholar 

  27. Kanevsky D (2005) Extended baum transformations for general functions, II”, Technical Report, RC23645(W0506–120). Human Language Technologies, IBM

    Google Scholar 

  28. Carmi A, Gurfil P, Kanevsky D Ramabhadran B (2009) Extended compressed sensing: filtering inspired methods for sparse signal recovery and their nonlinear variants. Technical Report, RC24785, Human Language Technologies, IBM.

    Google Scholar 

  29. Carmi A, Gurfil P, Kanevsky D, Ramabhadran B (2009) ABCS: Approximate bayesian compressed sensing. Technical Report, RC24816, Human Language Technologies, IBM.

    Google Scholar 

  30. Carmi A, Gurfil P, Kanevsky D (April 2010) Methods for signal recovering using kalman filtering with embedded pseudo-measurement norms and quasi-norms. IEEE Trans Signal Process 58(4):2405–2409

    Article  MathSciNet  Google Scholar 

  31. Horesh L, Gurfil P, Ramabhadran B, Kanevsky D, Carmi A, Sainath TN (2010) Kalman filtering for compressed sensing. In: Proceedings of the information fusion, Edinburgh.

    Google Scholar 

  32. Ji S, Xue Y, Carin L (June 2008) Bayesian compressive sensing. IEEE Trans Signal Process 56:2346–2356

    Article  MathSciNet  Google Scholar 

  33. Efron B, Hassie B, Johnstone T, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–451

    Article  MATH  Google Scholar 

  34. Carmi A, Gurfil P (2009) Convex feasibility programming for compressed sensing. Technical Report, Technion

    Google Scholar 

  35. Mount D, Arya S (2006) ANN: A library for approximate nearest neighbor searching. Software available at http://www.cs.umd.edu/ mount/ANN/

  36. Chang C, Lin C (2001) LIBSVM: A library for support vector machines. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm

  37. Kanevsky D (2004) Extended baum transformations for general functions. In: Proceedings of the ICASSP.

    Google Scholar 

  38. Povey D, Kanevsky D, Kingsbury B, Ramabhadran B, Saon G, Visweswariah K (2008) Boosted MMI for model and feature space discriminative training. In: Proceedings of the ICASSP.

    Google Scholar 

  39. Chang H, Glass J (2007) Hierarchical large-marging gaussian mixture models for phonetic classification. In: Proceedings of the ASRU.

    Google Scholar 

  40. Sainath TN, Ramabhadran B, Picheny M (2009) An exploration of large vocabulary tools for small vocabulary phonetic recognition. In: Proceedings of the ASRU.

    Google Scholar 

  41. Saon G, Zweig G, Kingsbury B, Mangu L, Chaudhari U (2003) An architecture for rapid decoding of large vocabulary conversational speech. In: Proceedings of the eurospeech.

    Google Scholar 

  42. Deng L, Yu D (2007) Use of differential cepstra as acoustic features in hidden trajectory modeling for phonetic recognition. In: Proceedings of the ICASSP.

    Google Scholar 

  43. Halberstat A, Glass J (1998) Heterogeneous measurements and multiple classifiers for speech recognition. In: Proceedings of the ICSLP.

    Google Scholar 

  44. Mohamad A, Sainath TN, Dahl G, Ramabhadrans B, Hinton GE, Picheny M (2011) Deep belief networks using discriminative features for phone recognition. In: Proceedings of the ICASSP.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tara N. Sainath .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sainath, T.N., Kanevsky, D., Nahamoo, D., Ramabhadran, B., Wright, S. (2014). Sparse Representations for Speech Recognition. In: Carmi, A., Mihaylova, L., Godsill, S. (eds) Compressed Sensing & Sparse Filtering. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38398-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38398-4_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38397-7

  • Online ISBN: 978-3-642-38398-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics