Language Identification Using Spectrogram Texture
Conference paper
First Online:
Abstract
This paper proposes a novel front-end for automatic spoken language recognition, based on the spectrogram representation of the speech signal and in the properties of the Fourier spectrum to detect global periodicity in an image. Local Phase Quantization (LPQ) texture descriptor was used to capture the spectrogram content. Results obtained for 30 seconds test signal duration have shown that this method is very promising for low cost language identification. The best performance is achieved when our proposed method is fused with the i-vector representation.
Keywords
Spoken language recognition Texture image descriptors Low cost language identification Download
to read the full conference paper text
References
- 1.Sangwan, A., Mehrabani, M., Hansen, J.H.L.: Language identification using a combined articulatory prosody framework. In: Proc. International Conference on Acoustics, Speech and Signal Processing, pp. 4400–4403. IEEE (2011)Google Scholar
- 2.Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., Narayanan, S.: Paralinguistics in speech and language - state-of-the-art and the challenge. Computer Speech & Language 27(1), 4–39 (2013)CrossRefGoogle Scholar
- 3.Rabiner, L., Schafer, R.: Theory and Applications of Digital Speech Processing, 1st edn. Prentice Hall Press, Upper Saddle River (2010)Google Scholar
- 4.Ojansivu, V., Heikkilä, J.: Blur insensitive texture classification using local phase quantization. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2008 2008. LNCS, vol. 5099, pp. 236–243. Springer, Heidelberg (2008) CrossRefGoogle Scholar
- 5.Costa, Y.M.G., de Oliveira, L.E.S., Koerich, A.L., Gouyon, F., Martins, J.G.: Music genre classification using LBP textural features. Signal Processing 92(11), 2723–2737 (2012)CrossRefGoogle Scholar
- 6.Costa, Y., Oliveira, L., Koerich, A., Gouyon, F.: Music genre recognition using gabor filters and LPQ texture descriptors. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) CIARP 2013, Part II. LNCS, vol. 8259, pp. 67–74. Springer, Heidelberg (2013) CrossRefGoogle Scholar
- 7.Costa, Y., Oliveira, L., Koerich, A., Gouyon, F.: Music genre recognition based on visual features with dynamic ensemble of classifiers selection. In: 20th International Conference on Systems, Signals and Image Processing, pp. 55–58, July 2013Google Scholar
- 8.luc Rouas, J.: Modeling long and short-term prosody for language identification. In: Proc. Int. Conf. on Spoken Language Processing (2005)Google Scholar
- 9.Rouas, J.L.: Automatic prosodic variations modelling for language and dialect discrimination. IEEE Transactions on Audio, Speech, and Language Processing 15(6), 1904–1911 (2007)CrossRefGoogle Scholar
- 10.Mary, L.: Extraction and Representation of Prosody for Speaker, Speech and Language Recognition. Springer Briefs in Electrical and Computer Engineering. Springer (2012)Google Scholar
- 11.Wolfe, J.: Speech and music, acoustics and coding, and what music might be for. In: Proc. 7th International Conference on Music Perception and Cognition, pp. 10–13 (2002)Google Scholar
- 12.Heikkilä, J., Ojansivu, V., Rahtu, E.: Improved blur insensitivity for decorrelated local phase quantization. In: 20th International Conference on Pattern Recognition, Istanbul, Turkey, pp. 818–821, August 23–26, 2010Google Scholar
- 13.Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)CrossRefGoogle Scholar
- 14.Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer (1995)Google Scholar
- 15.Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)CrossRefGoogle Scholar
- 16.Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., Deller Jr., J.R.: Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In: Proc. International Speech Communication Association Conference (2002)Google Scholar
- 17.Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D.A., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: Proc. International Speech Communication Association Conference, pp. 857–860 (2011)Google Scholar
- 18.McLaren, M., van Leeuwen, D.A.: Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE Transactions on Audio, Speech & Language Processing 20(3), 755–766 (2012)CrossRefGoogle Scholar
- 19.Jiang, B., Song, Y., Wei, S., McLoughlin, I.V., Dai, L.: Task-aware deep bottleneck features for spoken language identification. In: 15th Annual Conference of the International Speech Communication Association, pp. 3012–3016 (2014)Google Scholar
Copyright information
© Springer International Publishing Switzerland 2015