Spoken Term Detection Techniques

Mary, Leena; G, Deekshitha

doi:10.1007/978-3-319-97761-4_5

Leena Mary⁴ &
Deekshitha G⁵

Part of the book series: SpringerBriefs in Speech Technology ((BRIEFSSPEECHTECH))

625 Accesses

Abstract

The technology of audio search has now improved to search and retrieve any unspecified spoken word from an audio database with reasonable accuracy. This is termed as Spoken Term Detection (STD). STD can be broadly classified into Text-based STD and Query by Example STD (QbE-STD). In normal STD, query as well as database is converted to corresponding text/symbol for searching purpose. Since this technique needs speech to text mapping, some form of recognition systems is used. So, it is a challenging task for under-resourced languages. In QbE-STD, the query is processed in audio format. Speech to text conversion is not used for searching the database. There is no limitation on the query word length and its frequency. Based on the way of realization, STD systems are categorized as supervised and unsupervised. Various systems for STD are discussed in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vasudev D, Gangashetty SV, Anish Babu KK, Riyas KS, (2015) Query-by-example spoken term detection using Bessel features. In: IEEE international conference on signal processing, informatics, communication and energy systems (SPICES’15), pp 1–4. https://doi.org/10.1109/SPICES.2015.7091361
Hori T, Hetherington IL, Hazen TJ, Glass JR (2007) Open-vocabulary spoken utterance retrieval using confusion networks. In: Proceedings of ICASSP, pp 73–76
Google Scholar
Parada C, Sethy A, Ramabhadran B (2009) Query-by-example spoken term detection for OOV terms. In Proceedings of ASRU, pp 404–409
Google Scholar
Schwarz P, Matejka P, Burget L, Glembek O (2003) Phoneme recognizer based on long temporal context. Speech Processing Group, Faculty of Information Technology, Brno University of Technology [Online]. Available: http://speech.fit.vutbr.cz/en/software
Wallance R, Vogt R, Sridharan S (2007) A phonetic search approach to the 2006 NIST spoken term detection evaluation. In: INTERSPEECH, pp 2385–2388
Google Scholar
Thambiratnam K, Sridharan S (2007) Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Trans Audio Speech Lang Process 15(1):346–357
Article Google Scholar
Pinto J, Szoke I, Prasanna SRM, Hermansky H (2008) Fast automatic spoken term detection from sequence of phonemes. In: Proceedings of 31st annual international ACM SIGIR’08 conference, pp 28–33
Google Scholar
Wallance R, Vogt R, Sridharan S (2009) Spoken term detection using fast phonetic decoding. In: Proceedings of ICASSP, pp 4881–4884
Google Scholar
Lin H, Stupakov A, Bilmes J (2008) Spoken keyword spotting via multi-lattice alignment. In: INTERSPEECH, pp 2191–2194
Google Scholar
Mantena G, Achanta S, Prahallad K (2014) Query-by-example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping. IEEE/ACM Trans Audio Speech Lang Process 22(5):946–955
Article Google Scholar
Hazen TJ, Shen W, White C (2009) Query-by-example spoken term detection using phonetic posteriorgram templates. In: Proceedings of IEEE workshop on automatic speech recognition & understanding (ASRU), pp 421–426
Google Scholar
Zhang Y, Glass JR (2009) Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams. In: Proceedings of ASRU, pp 398–403
Google Scholar
Obara M, Moriya M, Konno R, Kojima K, Tanaka K, Lee S, Itoh Y (2017) Acceleration for query-by-example using posteriorgram of deep neural network. In: Proceedings of APSIPA ASC, Kuala Lumpur, pp 1565–1569
Google Scholar
Ram D, Asaei A, Bourlard H (2018) Sparse subspace modeling for query by example spoken term detection. In: IEEE/ACM transactions on audio, speech, and language processing, vol 26, no 6, pp 1130–1143
Article Google Scholar
Wang H, Lee T, Leung C (2011) Unsupervised spoken term detection with acoustic segment model. In: International conference on speech database and assessments (Oriental COCOSDA), pp 106–111
Google Scholar
Dumpala SH, Raju Alluri KNRK, Suryakanth VG, Uppala AKV (2015) Analysis of constraints on segmental DTW for the task of query-by-example spoken term detection. In: Annual IEEE India conference (INDICON), New Delhi, pp 1–6
Google Scholar
Madikeri SR, Murthy HA (2012) Acoustic segmentation using group delay functions and its relevance to spoken keyword spotting. In: Text speech and dialogue. Springer, Heidelberg, pp 496–504
Chapter Google Scholar
Karthik Pandia DS, Saranya MS, Murthy HA (2016) A fast query-by-example spoken term detection for zero resource languages. In: IEEE SPCOM’16, pp 1–5
Google Scholar
Sankar R, Jain A, Deepak KT, Vikram CM, Prasanna SRM (2016) Spoken term detection from continuous speech using ANN posteriors and image processing techniques. In: IEEE 22nd national conference on communication (NCC), pp 1–6
Google Scholar
Madhavi MC, Patil H (2017) Partial matching and search space reduction for QbE-STD. Comput Speech Lang 45:58–82
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics & Communication Engineering, Government Engineering College, Idukki, Kerala, India
Leena Mary
Department of Electronics and Communication Engineering, Rajiv Gandhi Institute of Technology, Kottayam, Kerala, India
Deekshitha G

Authors

Leena Mary
View author publications
You can also search for this author in PubMed Google Scholar
Deekshitha G
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mary, L., G, D. (2019). Spoken Term Detection Techniques. In: Searching Speech Databases. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-97761-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-97761-4_5
Published: 26 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97760-7
Online ISBN: 978-3-319-97761-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics