Multimedia Tools and Applications

, Volume 48, Issue 1, pp 123–140 | Cite as

Content-based search in multilingual audiovisual documents using the International Phonetic Alphabet

  • Georges QuénotEmail author
  • Tien Ping Tan
  • Viet Bac Le
  • Stéphane Ayache
  • Laurent Besacier
  • Philippe Mulhem


We present in this paper an approach based on the use of the International Phonetic Alphabet (IPA) for content-based indexing and retrieval of multilingual audiovisual documents. The approach works even if the languages of the document are unknown. It has been validated in the context of the “Star Challenge” search engine competition organized by the Agency for Science, Technology and Research (A*STAR) of Singapore. Our approach includes the building of an IPA-based multilingual acoustic model and a dynamic programming based method for searching document segments by “IPA string spotting”. Dynamic programming allows for retrieving the query string in the document string even with a significant transcription error rate at the phone level. The methods that we developed ranked us as first and third on the monolingual (English) search task, as fifth on the multilingual search task and as first on the multimodal (audio and image) search task.


Audio retrieval Multilingual International Phonetic Alphabet Dynamic programming Star Challenge 



Part of this work has been supported by the Quaero programme.


  1. 1.
    Ayache S, Quénot G (2007) Image and video indexing using networks of operators. J Image Video Process 2007(4):1–13. doi: 10.1155/2007/56928 CrossRefGoogle Scholar
  2. 2.
  3. 3.
    Clarkson P, Rosenfeld R (1997) Statistical language modeling using the CMU-Cambridge toolkit. In: Eurospeech’07, pp 2707–2710Google Scholar
  4. 4.
    Gauvain JL, Mariani JJ (1982) A method for connected word recognition and word spotting on a microprocessor. In: Proc. IEEE ICASSP 82, vol 2, pp 891–894Google Scholar
  5. 5.
  6. 6.
  7. 7.
    Le VB, Do-Dat T, Casteli E, Besacier L, Serignat JF (2004) Spoken and written language resources for Vietnamese. In: LREC’04, pp 599–602Google Scholar
  8. 8.
    Le VB, Besacier L, Schultz T (2006) Acoustic-phonetic similarities for context dependent acoustic model portability. In: Proc. IEEE ICASSP 2006Google Scholar
  9. 9.
    Li H, Ma B, Lee CH (2007) A vector space modeling approach to spoken language identification. IEEE Transactions on Audio, Speech and Language Processing 15:91–110Google Scholar
  10. 10.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110CrossRefGoogle Scholar
  11. 11.
    Mäenpää Topi Pietikäinen Matti OT (2000) Texture classification by multi-predicate local binary pattern operators. In: 15th international conference on pattern recognition, vol 3, pp 951–954Google Scholar
  12. 12.
    Moraru D, Besacier L, Meignier S, Fredouille C, Bonastre JF (2004) Speaker diarization in the ELISA consortium over the last 4 years. In: RT2004 fall workshopGoogle Scholar
  13. 13.
    Placeway P, Chen S, Eskenazi M, Jain U, Parikh V, Raj B, Ravishankar M, Rosenfeld R, Seymore K, Siegler M, Stern R, Thayer (1997) The 1996 hub-4 sphinx-3 system. In: In DARPA speech recognition workshop. ChantillyGoogle Scholar
  14. 14.
    Schultz T, Waibel A (2001) Language independent and language adaptive acoustic modeling for speech recognition. Speech Commun 35:31–51zbMATHCrossRefGoogle Scholar
  15. 15.
    Singhal A, Buckley C, Mitra A (1996) Pivoted document length normalization. In: ACM SIGIR conference. ACM, New York, pp 21–29Google Scholar
  16. 16.
    Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and trecvid. In: MIR’06: proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM, New York, pp 321–330. doi: 10.1145/1178677.1178722 CrossRefGoogle Scholar
  17. 17.
    Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Intl. conf. on spoken language processing.
  18. 18.
    Tan TP, Besacier L (2008) Improving pronunciation modeling for non-native speech recognition. In: Interspeech 2008Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Georges Quénot
    • 1
    Email author
  • Tien Ping Tan
    • 1
  • Viet Bac Le
    • 2
  • Stéphane Ayache
    • 3
  • Laurent Besacier
    • 1
  • Philippe Mulhem
    • 1
  1. 1.Laboratoire d’Informatique de GrenobleGrenoble Cedex 9France
  2. 2.LIMSI-CNRSOrsay CedexFrance
  3. 3.Laboratoire d’Informatique Fondamentale de MarseilleMarseille Cedex 9France

Personalised recommendations