Combined MFCC-FBCC Features for Unsupervised Query-by-Example Spoken Term Detection

  • Drisya Vasudev
  • Suryakanth V. Vasudev
  • K. K. Anish Babu
  • K. S. Riyas
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 384)

Abstract

A new set of features for addressing the problem of unsupervised spoken term detection is proposed in this paper. If we have a large audio database, the objective of this system is to find a spoken query in the databases. In unsupervised audio search, language specific resources are not required. Thus this system is more appropriate in cases where enough training data is not available for creating an Automatic Speech Recognition(ASR). Current state-of-the-art techniques use Mel Frequency Cepstral Coefficients(MFCC), Linear Predictive Cepstral Coefficients(LPCC) etc. as the features. For improving the performance of the system, FBCC (Fourier Bessel Cepstral Coefficients) combined with MFCC is used in this paper. Here, from the spoken example of a keyword, segmental Dynamic Time Warping is used to compare the Gaussian Posteriorgrams (GP),which are created from the feature vectors. By combining the GPs of MFCCs and FBCCs, a new set of feature representation is adapted in this work. The keyword detection result obtained using MediaEval 2012 database shows that this system outperforms the one that uses MFCC alone.

Keywords

Spoken term detection Query FBCC Gaussian mixture Gaussian posteriorgram Dynamic time warping 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Miller, D.R.H., et. al.: Rapid and accurate spoken term detection. In: INTERSPEECH 2007, pp. 314–317 (2007)Google Scholar
  2. 2.
    Saraclar, M., Sproat, R.: Lattice-based search for spoken utterance retrieval. In: HLT-NAACL, pp. 129–136 (2004)Google Scholar
  3. 3.
    Ng, K.: Subword-based approaches for spoken document retrieval, Ph.D. dissertation, Massachusetts Institute of Technology (2000)Google Scholar
  4. 4.
    Zhang, Y., Glass, J.R.: Unsupervised spoken keyword spotting via segmental dtw on Gaussian posteriorgrams. In: ASRU, pp. 398–403 (2009)Google Scholar
  5. 5.
    Zgank, A., Kacic, Z., Vicsi, K., Szaszak, G., Diehl, F., Juhar, J., Lihan, S.: Crosslingual transfer of source acoustic models to two different target languages. Robust (2004)Google Scholar
  6. 6.
    Prakash, C., Gangashetty, S.V.: Fourier Bessel based Cepstral Coefficient features for Text-Independent Speaker Identification. In: International Conference on Articial Intelligence (December 2011)Google Scholar
  7. 7.
    Vasudev, D., Babu, K.K.A.: Speaker Identification using FBCC in Malayalam language. in: ICACCI 2014 (September 2014)Google Scholar
  8. 8.
    Vasudev, D., Babu K.K.A.: Query-by-example Spoken Term Detection using Bessel Features. In: IEEE SPICES 2015 (February 2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Drisya Vasudev
    • 1
  • Suryakanth V. Vasudev
    • 2
  • K. K. Anish Babu
    • 1
  • K. S. Riyas
    • 1
  1. 1.Department of Electronics and Communication EngineeringRajiv Gandhi Institute of TechnologyKeralaIndia
  2. 2.Speech and Vision LabIIIT HyderabadHyderabadIndia

Personalised recommendations