Abstract
This chapter presents the methods that are currently exploited for sparse optimization in speech. It also demonstrates how sparse representations can be constructed for classification and recognition tasks, and gives an overview of recent results that were obtained with sparse representations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that the Gaussian means we refer to in this work are built from the original training data, not the projected \(H\beta \) features.
- 2.
Using SRs to compute accuracy is described in [14].
- 3.
We have not included the accuracy of the HMM since this takes into account sequence information which both the GMM and SR methods do not.
References
Deselaers T, Heigold G, Ney H (2007) Speech recognition with state-based nearest neighbour classifiers. In: Proceedings of the interspeech.
Gemmeke JF, Virtanen T (2010) Noise robust exemplar-based connected digit recognition. In: Proceedings of the ICASSP.
Sainath TN, Carmi A, Kanevsky D, Ramabhadran B (2010) Bayesian compressive sensing for phonetic classification. In: Proceedings of the ICASSP.
De Wachter M, Demuynck K, Van Compernolle D, Wambacq P (2003) Data driven example based continuous speech recognition. In: Proceedings of the european conference on speech communication and technology.
Tychonoff A, Arseny V (1977) Solution of ill-posed problems. Winston and Sons, Washington
Wright J, Yang A, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31: 210–227
Carmi A, Gurfil P, Kanevsky D, Ramabhadran B (2009) ABCS: approximate bayesian compressive sensing. Technical Report Human Language Technologies, IBM
Sainath TN, Nahamoo D, Kanevsky D, Ramabhadrans B, Shah PM (2011) A convex hull approach to sparse representations for exemplar-based speech recognition. In: Proceedings of the ASRU.
Sainath T, Ramabhadran B, Olsen P, Kanevsky D, Nahamoo D (2011) A-Functions: a generalization of extended baum-welch transformations to convex optimization. In: Proceedings of the ICASSP.
Kanevsky D, Sainath TN, Ramabhadran B, Nahamoo D (2010) An analysis of sparseness and regularization in exemplar-based methods for speech classification. In: Proceedings of the interspeech.
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol.) 58(1):267–288
Ji S, Xue Y, Carin L (2008) Bayesian compressive sensing. IEEE Trans Signal Process 56:2346–2356
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Statist Soc B 67:301–320
Sainath TN, Ramabhadran B, Nahamoo D, Kanevsky D, Sethy A (2010) Exemplar-based sparse representation features for speech recognition. In: Proceedings of the interspeech.
Sainath TN, Nahamoo D, Ramabhadran B, Kanevsky D, Goel V, Shah PM (2011) Exemplar-based sparse representation phone identification features. In: Proceedings of the ICASSP.
Lamel L, Kassel R, Seneff S (1986) Speech database development: design and analysis of the acoustic-phonetic corpus. In: Proceedings of the DARPA speech recognition, workshop.
Kingsbury B (2009) Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling In: Proceedings of the ICASSP.
De Wachter M, Matton M, Demuynck K, Wambacq P, Cools R, Van Compernolle D (2007) Template based continuous speech recognition. IEEE Trans Audio Speech Lang Process 15(4):1377–1390
Sainath TN, Ramabhadran B, Nahamoo D, Kanevsky D, Sethy A (2012) Enhancing exemplar-based posteriors for speech recognition tasks. In: Proceedings of the interspeech.
Bellegarda J, Nahamoo D (1990) Tied mixture continuous parameter modeling for speech recognition. IEEE Trans Acous Speech Signal Process 38(12):2033–2045
Sainath TN, Ramabhadran B, Picheny M, Nahamoo D, Kanevsky D (2011) Exemplar-based sparse representation features: From TIMIT to LVCSR. IEEE Trans Acous Speech and Signal Process 19(8):2598–2613
Candes EJ, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52:489–509
Candes EJ (2006) Compressive sampling. Proceedings of the international congress of mathematicians, European Mathematical Society, Madrid, Spain
Gopalakrishnan PS, Kanevsky D, Nahamoo D, Nadas A (1991) An inequality for rational functions with applications to some statistical estimation problems. IEEE Trans. Information Theory 37(1): 107–113
Povey D (2003) Discriminative training for large vocabulary speech recognition. Ph.D. thesis, Cambridge University.
Sainath T, Ramabhadran B, Olsen P, Kanevsky D, Nahamoo D (2011) Convergence of line search a-function methods. In: Proceedings of the interspeech.
Kanevsky D (2005) Extended baum transformations for general functions, II”, Technical Report, RC23645(W0506–120). Human Language Technologies, IBM
Carmi A, Gurfil P, Kanevsky D Ramabhadran B (2009) Extended compressed sensing: filtering inspired methods for sparse signal recovery and their nonlinear variants. Technical Report, RC24785, Human Language Technologies, IBM.
Carmi A, Gurfil P, Kanevsky D, Ramabhadran B (2009) ABCS: Approximate bayesian compressed sensing. Technical Report, RC24816, Human Language Technologies, IBM.
Carmi A, Gurfil P, Kanevsky D (April 2010) Methods for signal recovering using kalman filtering with embedded pseudo-measurement norms and quasi-norms. IEEE Trans Signal Process 58(4):2405–2409
Horesh L, Gurfil P, Ramabhadran B, Kanevsky D, Carmi A, Sainath TN (2010) Kalman filtering for compressed sensing. In: Proceedings of the information fusion, Edinburgh.
Ji S, Xue Y, Carin L (June 2008) Bayesian compressive sensing. IEEE Trans Signal Process 56:2346–2356
Efron B, Hassie B, Johnstone T, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–451
Carmi A, Gurfil P (2009) Convex feasibility programming for compressed sensing. Technical Report, Technion
Mount D, Arya S (2006) ANN: A library for approximate nearest neighbor searching. Software available at http://www.cs.umd.edu/ mount/ANN/
Chang C, Lin C (2001) LIBSVM: A library for support vector machines. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm
Kanevsky D (2004) Extended baum transformations for general functions. In: Proceedings of the ICASSP.
Povey D, Kanevsky D, Kingsbury B, Ramabhadran B, Saon G, Visweswariah K (2008) Boosted MMI for model and feature space discriminative training. In: Proceedings of the ICASSP.
Chang H, Glass J (2007) Hierarchical large-marging gaussian mixture models for phonetic classification. In: Proceedings of the ASRU.
Sainath TN, Ramabhadran B, Picheny M (2009) An exploration of large vocabulary tools for small vocabulary phonetic recognition. In: Proceedings of the ASRU.
Saon G, Zweig G, Kingsbury B, Mangu L, Chaudhari U (2003) An architecture for rapid decoding of large vocabulary conversational speech. In: Proceedings of the eurospeech.
Deng L, Yu D (2007) Use of differential cepstra as acoustic features in hidden trajectory modeling for phonetic recognition. In: Proceedings of the ICASSP.
Halberstat A, Glass J (1998) Heterogeneous measurements and multiple classifiers for speech recognition. In: Proceedings of the ICSLP.
Mohamad A, Sainath TN, Dahl G, Ramabhadrans B, Hinton GE, Picheny M (2011) Deep belief networks using discriminative features for phone recognition. In: Proceedings of the ICASSP.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sainath, T.N., Kanevsky, D., Nahamoo, D., Ramabhadran, B., Wright, S. (2014). Sparse Representations for Speech Recognition. In: Carmi, A., Mihaylova, L., Godsill, S. (eds) Compressed Sensing & Sparse Filtering. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38398-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-38398-4_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38397-7
Online ISBN: 978-3-642-38398-4
eBook Packages: EngineeringEngineering (R0)