Abstract
The technology of audio search has now improved to search and retrieve any unspecified spoken word from an audio database with reasonable accuracy. This is termed as Spoken Term Detection (STD). STD can be broadly classified into Text-based STD and Query by Example STD (QbE-STD). In normal STD, query as well as database is converted to corresponding text/symbol for searching purpose. Since this technique needs speech to text mapping, some form of recognition systems is used. So, it is a challenging task for under-resourced languages. In QbE-STD, the query is processed in audio format. Speech to text conversion is not used for searching the database. There is no limitation on the query word length and its frequency. Based on the way of realization, STD systems are categorized as supervised and unsupervised. Various systems for STD are discussed in this chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vasudev D, Gangashetty SV, Anish Babu KK, Riyas KS, (2015) Query-by-example spoken term detection using Bessel features. In: IEEE international conference on signal processing, informatics, communication and energy systems (SPICES’15), pp 1–4. https://doi.org/10.1109/SPICES.2015.7091361
Hori T, Hetherington IL, Hazen TJ, Glass JR (2007) Open-vocabulary spoken utterance retrieval using confusion networks. In: Proceedings of ICASSP, pp 73–76
Parada C, Sethy A, Ramabhadran B (2009) Query-by-example spoken term detection for OOV terms. In Proceedings of ASRU, pp 404–409
Schwarz P, Matejka P, Burget L, Glembek O (2003) Phoneme recognizer based on long temporal context. Speech Processing Group, Faculty of Information Technology, Brno University of Technology [Online]. Available: http://speech.fit.vutbr.cz/en/software
Wallance R, Vogt R, Sridharan S (2007) A phonetic search approach to the 2006 NIST spoken term detection evaluation. In: INTERSPEECH, pp 2385–2388
Thambiratnam K, Sridharan S (2007) Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Trans Audio Speech Lang Process 15(1):346–357
Pinto J, Szoke I, Prasanna SRM, Hermansky H (2008) Fast automatic spoken term detection from sequence of phonemes. In: Proceedings of 31st annual international ACM SIGIR’08 conference, pp 28–33
Wallance R, Vogt R, Sridharan S (2009) Spoken term detection using fast phonetic decoding. In: Proceedings of ICASSP, pp 4881–4884
Lin H, Stupakov A, Bilmes J (2008) Spoken keyword spotting via multi-lattice alignment. In: INTERSPEECH, pp 2191–2194
Mantena G, Achanta S, Prahallad K (2014) Query-by-example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping. IEEE/ACM Trans Audio Speech Lang Process 22(5):946–955
Hazen TJ, Shen W, White C (2009) Query-by-example spoken term detection using phonetic posteriorgram templates. In: Proceedings of IEEE workshop on automatic speech recognition & understanding (ASRU), pp 421–426
Zhang Y, Glass JR (2009) Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams. In: Proceedings of ASRU, pp 398–403
Obara M, Moriya M, Konno R, Kojima K, Tanaka K, Lee S, Itoh Y (2017) Acceleration for query-by-example using posteriorgram of deep neural network. In: Proceedings of APSIPA ASC, Kuala Lumpur, pp 1565–1569
Ram D, Asaei A, Bourlard H (2018) Sparse subspace modeling for query by example spoken term detection. In: IEEE/ACM transactions on audio, speech, and language processing, vol 26, no 6, pp 1130–1143
Wang H, Lee T, Leung C (2011) Unsupervised spoken term detection with acoustic segment model. In: International conference on speech database and assessments (Oriental COCOSDA), pp 106–111
Dumpala SH, Raju Alluri KNRK, Suryakanth VG, Uppala AKV (2015) Analysis of constraints on segmental DTW for the task of query-by-example spoken term detection. In: Annual IEEE India conference (INDICON), New Delhi, pp 1–6
Madikeri SR, Murthy HA (2012) Acoustic segmentation using group delay functions and its relevance to spoken keyword spotting. In: Text speech and dialogue. Springer, Heidelberg, pp 496–504
Karthik Pandia DS, Saranya MS, Murthy HA (2016) A fast query-by-example spoken term detection for zero resource languages. In: IEEE SPCOM’16, pp 1–5
Sankar R, Jain A, Deepak KT, Vikram CM, Prasanna SRM (2016) Spoken term detection from continuous speech using ANN posteriors and image processing techniques. In: IEEE 22nd national conference on communication (NCC), pp 1–6
Madhavi MC, Patil H (2017) Partial matching and search space reduction for QbE-STD. Comput Speech Lang 45:58–82
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mary, L., G, D. (2019). Spoken Term Detection Techniques. In: Searching Speech Databases. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-97761-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-97761-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97760-7
Online ISBN: 978-3-319-97761-4
eBook Packages: EngineeringEngineering (R0)