Language Identification Using Spectrogram Texture

  • Ana MontalvoEmail author
  • Yandre M. G. Costa
  • José Ramón Calvo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9423)


This paper proposes a novel front-end for automatic spoken language recognition, based on the spectrogram representation of the speech signal and in the properties of the Fourier spectrum to detect global periodicity in an image. Local Phase Quantization (LPQ) texture descriptor was used to capture the spectrogram content. Results obtained for 30 seconds test signal duration have shown that this method is very promising for low cost language identification. The best performance is achieved when our proposed method is fused with the i-vector representation.


Spoken language recognition Texture image descriptors Low cost language identification 


  1. 1.
    Sangwan, A., Mehrabani, M., Hansen, J.H.L.: Language identification using a combined articulatory prosody framework. In: Proc. International Conference on Acoustics, Speech and Signal Processing, pp. 4400–4403. IEEE (2011)Google Scholar
  2. 2.
    Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., Narayanan, S.: Paralinguistics in speech and language - state-of-the-art and the challenge. Computer Speech & Language 27(1), 4–39 (2013)CrossRefGoogle Scholar
  3. 3.
    Rabiner, L., Schafer, R.: Theory and Applications of Digital Speech Processing, 1st edn. Prentice Hall Press, Upper Saddle River (2010)Google Scholar
  4. 4.
    Ojansivu, V., Heikkilä, J.: Blur insensitive texture classification using local phase quantization. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2008 2008. LNCS, vol. 5099, pp. 236–243. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  5. 5.
    Costa, Y.M.G., de Oliveira, L.E.S., Koerich, A.L., Gouyon, F., Martins, J.G.: Music genre classification using LBP textural features. Signal Processing 92(11), 2723–2737 (2012)CrossRefGoogle Scholar
  6. 6.
    Costa, Y., Oliveira, L., Koerich, A., Gouyon, F.: Music genre recognition using gabor filters and LPQ texture descriptors. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) CIARP 2013, Part II. LNCS, vol. 8259, pp. 67–74. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  7. 7.
    Costa, Y., Oliveira, L., Koerich, A., Gouyon, F.: Music genre recognition based on visual features with dynamic ensemble of classifiers selection. In: 20th International Conference on Systems, Signals and Image Processing, pp. 55–58, July 2013Google Scholar
  8. 8.
    luc Rouas, J.: Modeling long and short-term prosody for language identification. In: Proc. Int. Conf. on Spoken Language Processing (2005)Google Scholar
  9. 9.
    Rouas, J.L.: Automatic prosodic variations modelling for language and dialect discrimination. IEEE Transactions on Audio, Speech, and Language Processing 15(6), 1904–1911 (2007)CrossRefGoogle Scholar
  10. 10.
    Mary, L.: Extraction and Representation of Prosody for Speaker, Speech and Language Recognition. Springer Briefs in Electrical and Computer Engineering. Springer (2012)Google Scholar
  11. 11.
    Wolfe, J.: Speech and music, acoustics and coding, and what music might be for. In: Proc. 7th International Conference on Music Perception and Cognition, pp. 10–13 (2002)Google Scholar
  12. 12.
    Heikkilä, J., Ojansivu, V., Rahtu, E.: Improved blur insensitivity for decorrelated local phase quantization. In: 20th International Conference on Pattern Recognition, Istanbul, Turkey, pp. 818–821, August 23–26, 2010Google Scholar
  13. 13.
    Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)CrossRefGoogle Scholar
  14. 14.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer (1995)Google Scholar
  15. 15.
    Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)CrossRefGoogle Scholar
  16. 16.
    Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., Deller Jr., J.R.: Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In: Proc. International Speech Communication Association Conference (2002)Google Scholar
  17. 17.
    Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D.A., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: Proc. International Speech Communication Association Conference, pp. 857–860 (2011)Google Scholar
  18. 18.
    McLaren, M., van Leeuwen, D.A.: Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE Transactions on Audio, Speech & Language Processing 20(3), 755–766 (2012)CrossRefGoogle Scholar
  19. 19.
    Jiang, B., Song, Y., Wei, S., McLoughlin, I.V., Dai, L.: Task-aware deep bottleneck features for spoken language identification. In: 15th Annual Conference of the International Speech Communication Association, pp. 3012–3016 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ana Montalvo
    • 1
    Email author
  • Yandre M. G. Costa
    • 2
  • José Ramón Calvo
    • 1
  1. 1.Advanced Technologies Application CenterHavanaCuba
  2. 2.Department of InformaticsState University of MaringáMaringáBrazil

Personalised recommendations