Skip to main content

Spoken Term Detection Techniques

  • Chapter
  • First Online:
Searching Speech Databases

Part of the book series: SpringerBriefs in Speech Technology ((BRIEFSSPEECHTECH))

  • 625 Accesses

Abstract

The technology of audio search has now improved to search and retrieve any unspecified spoken word from an audio database with reasonable accuracy. This is termed as Spoken Term Detection (STD). STD can be broadly classified into Text-based STD and Query by Example STD (QbE-STD). In normal STD, query as well as database is converted to corresponding text/symbol for searching purpose. Since this technique needs speech to text mapping, some form of recognition systems is used. So, it is a challenging task for under-resourced languages. In QbE-STD, the query is processed in audio format. Speech to text conversion is not used for searching the database. There is no limitation on the query word length and its frequency. Based on the way of realization, STD systems are categorized as supervised and unsupervised. Various systems for STD are discussed in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vasudev D, Gangashetty SV, Anish Babu KK, Riyas KS, (2015) Query-by-example spoken term detection using Bessel features. In: IEEE international conference on signal processing, informatics, communication and energy systems (SPICES’15), pp 1–4. https://doi.org/10.1109/SPICES.2015.7091361

  2. Hori T, Hetherington IL, Hazen TJ, Glass JR (2007) Open-vocabulary spoken utterance retrieval using confusion networks. In: Proceedings of ICASSP, pp 73–76

    Google Scholar 

  3. Parada C, Sethy A, Ramabhadran B (2009) Query-by-example spoken term detection for OOV terms. In Proceedings of ASRU, pp 404–409

    Google Scholar 

  4. Schwarz P, Matejka P, Burget L, Glembek O (2003) Phoneme recognizer based on long temporal context. Speech Processing Group, Faculty of Information Technology, Brno University of Technology [Online]. Available: http://speech.fit.vutbr.cz/en/software

  5. Wallance R, Vogt R, Sridharan S (2007) A phonetic search approach to the 2006 NIST spoken term detection evaluation. In: INTERSPEECH, pp 2385–2388

    Google Scholar 

  6. Thambiratnam K, Sridharan S (2007) Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Trans Audio Speech Lang Process 15(1):346–357

    Article  Google Scholar 

  7. Pinto J, Szoke I, Prasanna SRM, Hermansky H (2008) Fast automatic spoken term detection from sequence of phonemes. In: Proceedings of 31st annual international ACM SIGIR’08 conference, pp 28–33

    Google Scholar 

  8. Wallance R, Vogt R, Sridharan S (2009) Spoken term detection using fast phonetic decoding. In: Proceedings of ICASSP, pp 4881–4884

    Google Scholar 

  9. Lin H, Stupakov A, Bilmes J (2008) Spoken keyword spotting via multi-lattice alignment. In: INTERSPEECH, pp 2191–2194

    Google Scholar 

  10. Mantena G, Achanta S, Prahallad K (2014) Query-by-example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping. IEEE/ACM Trans Audio Speech Lang Process 22(5):946–955

    Article  Google Scholar 

  11. Hazen TJ, Shen W, White C (2009) Query-by-example spoken term detection using phonetic posteriorgram templates. In: Proceedings of IEEE workshop on automatic speech recognition & understanding (ASRU), pp 421–426

    Google Scholar 

  12. Zhang Y, Glass JR (2009) Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams. In: Proceedings of ASRU, pp 398–403

    Google Scholar 

  13. Obara M, Moriya M, Konno R, Kojima K, Tanaka K, Lee S, Itoh Y (2017) Acceleration for query-by-example using posteriorgram of deep neural network. In: Proceedings of APSIPA ASC, Kuala Lumpur, pp 1565–1569

    Google Scholar 

  14. Ram D, Asaei A, Bourlard H (2018) Sparse subspace modeling for query by example spoken term detection. In: IEEE/ACM transactions on audio, speech, and language processing, vol 26, no 6, pp 1130–1143

    Article  Google Scholar 

  15. Wang H, Lee T, Leung C (2011) Unsupervised spoken term detection with acoustic segment model. In: International conference on speech database and assessments (Oriental COCOSDA), pp 106–111

    Google Scholar 

  16. Dumpala SH, Raju Alluri KNRK, Suryakanth VG, Uppala AKV (2015) Analysis of constraints on segmental DTW for the task of query-by-example spoken term detection. In: Annual IEEE India conference (INDICON), New Delhi, pp 1–6

    Google Scholar 

  17. Madikeri SR, Murthy HA (2012) Acoustic segmentation using group delay functions and its relevance to spoken keyword spotting. In: Text speech and dialogue. Springer, Heidelberg, pp 496–504

    Chapter  Google Scholar 

  18. Karthik Pandia DS, Saranya MS, Murthy HA (2016) A fast query-by-example spoken term detection for zero resource languages. In: IEEE SPCOM’16, pp 1–5

    Google Scholar 

  19. Sankar R, Jain A, Deepak KT, Vikram CM, Prasanna SRM (2016) Spoken term detection from continuous speech using ANN posteriors and image processing techniques. In: IEEE 22nd national conference on communication (NCC), pp 1–6

    Google Scholar 

  20. Madhavi MC, Patil H (2017) Partial matching and search space reduction for QbE-STD. Comput Speech Lang 45:58–82

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mary, L., G, D. (2019). Spoken Term Detection Techniques. In: Searching Speech Databases. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-97761-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97761-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97760-7

  • Online ISBN: 978-3-319-97761-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics