Sparse Representations for Speech Recognition

Sainath, Tara N.; Kanevsky, Dimitri; Nahamoo, David; Ramabhadran, Bhuvana; Wright, Stephen

doi:10.1007/978-3-642-38398-4_15

Tara N. Sainath⁴,
Dimitri Kanevsky⁴,
David Nahamoo⁴,
Bhuvana Ramabhadran⁴ &
…
Stephen Wright⁵

Part of the book series: Signals and Communication Technology ((SCT))

3583 Accesses

Abstract

This chapter presents the methods that are currently exploited for sparse optimization in speech. It also demonstrates how sparse representations can be constructed for classification and recognition tasks, and gives an overview of recent results that were obtained with sparse representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that the Gaussian means we refer to in this work are built from the original training data, not the projected \(H\beta \) features.
2.
Using SRs to compute accuracy is described in [14].
3.
We have not included the accuracy of the HMM since this takes into account sequence information which both the GMM and SR methods do not.

References

Deselaers T, Heigold G, Ney H (2007) Speech recognition with state-based nearest neighbour classifiers. In: Proceedings of the interspeech.
Google Scholar
Gemmeke JF, Virtanen T (2010) Noise robust exemplar-based connected digit recognition. In: Proceedings of the ICASSP.
Google Scholar
Sainath TN, Carmi A, Kanevsky D, Ramabhadran B (2010) Bayesian compressive sensing for phonetic classification. In: Proceedings of the ICASSP.
Google Scholar
De Wachter M, Demuynck K, Van Compernolle D, Wambacq P (2003) Data driven example based continuous speech recognition. In: Proceedings of the european conference on speech communication and technology.
Google Scholar
Tychonoff A, Arseny V (1977) Solution of ill-posed problems. Winston and Sons, Washington
Google Scholar
Wright J, Yang A, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31: 210–227
Google Scholar
Carmi A, Gurfil P, Kanevsky D, Ramabhadran B (2009) ABCS: approximate bayesian compressive sensing. Technical Report Human Language Technologies, IBM
Google Scholar
Sainath TN, Nahamoo D, Kanevsky D, Ramabhadrans B, Shah PM (2011) A convex hull approach to sparse representations for exemplar-based speech recognition. In: Proceedings of the ASRU.
Google Scholar
Sainath T, Ramabhadran B, Olsen P, Kanevsky D, Nahamoo D (2011) A-Functions: a generalization of extended baum-welch transformations to convex optimization. In: Proceedings of the ICASSP.
Google Scholar
Kanevsky D, Sainath TN, Ramabhadran B, Nahamoo D (2010) An analysis of sparseness and regularization in exemplar-based methods for speech classification. In: Proceedings of the interspeech.
Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol.) 58(1):267–288
MathSciNet MATH Google Scholar
Ji S, Xue Y, Carin L (2008) Bayesian compressive sensing. IEEE Trans Signal Process 56:2346–2356
Article MathSciNet Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Statist Soc B 67:301–320
Article MathSciNet MATH Google Scholar
Sainath TN, Ramabhadran B, Nahamoo D, Kanevsky D, Sethy A (2010) Exemplar-based sparse representation features for speech recognition. In: Proceedings of the interspeech.
Google Scholar
Sainath TN, Nahamoo D, Ramabhadran B, Kanevsky D, Goel V, Shah PM (2011) Exemplar-based sparse representation phone identification features. In: Proceedings of the ICASSP.
Google Scholar
Lamel L, Kassel R, Seneff S (1986) Speech database development: design and analysis of the acoustic-phonetic corpus. In: Proceedings of the DARPA speech recognition, workshop.
Google Scholar
Kingsbury B (2009) Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling In: Proceedings of the ICASSP.
Google Scholar
De Wachter M, Matton M, Demuynck K, Wambacq P, Cools R, Van Compernolle D (2007) Template based continuous speech recognition. IEEE Trans Audio Speech Lang Process 15(4):1377–1390
Google Scholar
Sainath TN, Ramabhadran B, Nahamoo D, Kanevsky D, Sethy A (2012) Enhancing exemplar-based posteriors for speech recognition tasks. In: Proceedings of the interspeech.
Google Scholar
Bellegarda J, Nahamoo D (1990) Tied mixture continuous parameter modeling for speech recognition. IEEE Trans Acous Speech Signal Process 38(12):2033–2045
Google Scholar
Sainath TN, Ramabhadran B, Picheny M, Nahamoo D, Kanevsky D (2011) Exemplar-based sparse representation features: From TIMIT to LVCSR. IEEE Trans Acous Speech and Signal Process 19(8):2598–2613
Google Scholar
Candes EJ, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52:489–509
Article MathSciNet MATH Google Scholar
Candes EJ (2006) Compressive sampling. Proceedings of the international congress of mathematicians, European Mathematical Society, Madrid, Spain
Google Scholar
Gopalakrishnan PS, Kanevsky D, Nahamoo D, Nadas A (1991) An inequality for rational functions with applications to some statistical estimation problems. IEEE Trans. Information Theory 37(1): 107–113
Google Scholar
Povey D (2003) Discriminative training for large vocabulary speech recognition. Ph.D. thesis, Cambridge University.
Google Scholar
Sainath T, Ramabhadran B, Olsen P, Kanevsky D, Nahamoo D (2011) Convergence of line search a-function methods. In: Proceedings of the interspeech.
Google Scholar
Kanevsky D (2005) Extended baum transformations for general functions, II”, Technical Report, RC23645(W0506–120). Human Language Technologies, IBM
Google Scholar
Carmi A, Gurfil P, Kanevsky D Ramabhadran B (2009) Extended compressed sensing: filtering inspired methods for sparse signal recovery and their nonlinear variants. Technical Report, RC24785, Human Language Technologies, IBM.
Google Scholar
Carmi A, Gurfil P, Kanevsky D, Ramabhadran B (2009) ABCS: Approximate bayesian compressed sensing. Technical Report, RC24816, Human Language Technologies, IBM.
Google Scholar
Carmi A, Gurfil P, Kanevsky D (April 2010) Methods for signal recovering using kalman filtering with embedded pseudo-measurement norms and quasi-norms. IEEE Trans Signal Process 58(4):2405–2409
Article MathSciNet Google Scholar
Horesh L, Gurfil P, Ramabhadran B, Kanevsky D, Carmi A, Sainath TN (2010) Kalman filtering for compressed sensing. In: Proceedings of the information fusion, Edinburgh.
Google Scholar
Ji S, Xue Y, Carin L (June 2008) Bayesian compressive sensing. IEEE Trans Signal Process 56:2346–2356
Article MathSciNet Google Scholar
Efron B, Hassie B, Johnstone T, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–451
Article MATH Google Scholar
Carmi A, Gurfil P (2009) Convex feasibility programming for compressed sensing. Technical Report, Technion
Google Scholar
Mount D, Arya S (2006) ANN: A library for approximate nearest neighbor searching. Software available at http://www.cs.umd.edu/ mount/ANN/
Chang C, Lin C (2001) LIBSVM: A library for support vector machines. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm
Kanevsky D (2004) Extended baum transformations for general functions. In: Proceedings of the ICASSP.
Google Scholar
Povey D, Kanevsky D, Kingsbury B, Ramabhadran B, Saon G, Visweswariah K (2008) Boosted MMI for model and feature space discriminative training. In: Proceedings of the ICASSP.
Google Scholar
Chang H, Glass J (2007) Hierarchical large-marging gaussian mixture models for phonetic classification. In: Proceedings of the ASRU.
Google Scholar
Sainath TN, Ramabhadran B, Picheny M (2009) An exploration of large vocabulary tools for small vocabulary phonetic recognition. In: Proceedings of the ASRU.
Google Scholar
Saon G, Zweig G, Kingsbury B, Mangu L, Chaudhari U (2003) An architecture for rapid decoding of large vocabulary conversational speech. In: Proceedings of the eurospeech.
Google Scholar
Deng L, Yu D (2007) Use of differential cepstra as acoustic features in hidden trajectory modeling for phonetic recognition. In: Proceedings of the ICASSP.
Google Scholar
Halberstat A, Glass J (1998) Heterogeneous measurements and multiple classifiers for speech recognition. In: Proceedings of the ICSLP.
Google Scholar
Mohamad A, Sainath TN, Dahl G, Ramabhadrans B, Hinton GE, Picheny M (2011) Deep belief networks using discriminative features for phone recognition. In: Proceedings of the ICASSP.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Tara N. Sainath, Dimitri Kanevsky, David Nahamoo & Bhuvana Ramabhadran
University of Wisconsin, Madison, WI, USA
Stephen Wright

Authors

Tara N. Sainath
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri Kanevsky
View author publications
You can also search for this author in PubMed Google Scholar
David Nahamoo
View author publications
You can also search for this author in PubMed Google Scholar
Bhuvana Ramabhadran
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Wright
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tara N. Sainath .

Editor information

Editors and Affiliations

Department of Mechanical and Aerospace Engineering, Nanyang Technical University, Singapore, Singapore
Avishy Y. Carmi
School of Computing and Communications, Lancaster University, Lancaster, United Kingdom
Lyudmila Mihaylova
Department of Engineering, University of Cambridge, Cambridge, United Kingdom
Simon J. Godsill

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sainath, T.N., Kanevsky, D., Nahamoo, D., Ramabhadran, B., Wright, S. (2014). Sparse Representations for Speech Recognition. In: Carmi, A., Mihaylova, L., Godsill, S. (eds) Compressed Sensing & Sparse Filtering. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38398-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-38398-4_15
Published: 13 September 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38397-7
Online ISBN: 978-3-642-38398-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics