Merge-Weighted Dynamic Time Warping for Speech Recognition

Zhang, Xiang-Lilan; Luo, Zhi-Gang; Li, Ming

doi:10.1007/s11390-014-1491-0

Merge-Weighted Dynamic Time Warping for Speech Recognition

Regular Paper
Published: 17 November 2014

Volume 29, pages 1072–1082, (2014)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Xiang-Lilan Zhang^1,2,3,
Zhi-Gang Luo² &
Ming Li³

211 Accesses
6 Citations
Explore all metrics

Abstract

Obtaining training material for rarely used English words and common given names from countries where English is not spoken is difficult due to excessive time, storage and cost factors. By considering personal privacy, languageindependent (LI) with lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a convenient option to solve the problem. The dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small-footprint SD ASR for real-time applications with limited storage and small vocabularies. These applications include voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. However, traditional DTW has several limitations, such as high computational complexity, constraint induced coarse approximation, and inaccuracy problems. In this paper, we introduce the merge-weighted dynamic time warping (MWDTW) algorithm. This method defines a template confidence index for measuring the similarity between merged training data and testing data, while following the core DTW process. MWDTW is simple, efficient, and easy to implement. With extensive experiments on three representative SD speech recognition datasets, we demonstrate that our method outperforms DTW, DTW on merged speech data, the hidden Markov model (HMM) significantly, and is also six times faster than DTW overall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of methods for time series change point detection

Article 08 September 2016

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Transformers in Time-Series Analysis: A Tutorial

Article 25 July 2023

References

Deng L. Dynamic Speech Models: Theory, Algorithm, and Application. Morgan & Claypool, 2006.
Furui S. History and development of speech recognition. In Speech Technology: Theory and Application, Chen F, Jokinen K (eds.), New York: Springer, 2010, pp.1–18.
Chapaneri S V. Spoken digits recognition using weighted MFCC and improved features for dynamic time warping. International Journal of Computer Application, 2012, 40(3): 6–12.
Article Google Scholar
Cox R V, Kamm C A, Rabiner L R, Schroeter J, Wilpon J G. Speech and language processing for next-millennum communications services. Proc. the IEEE, 2000, 88(8): 1314–1337.
Article Google Scholar
Marti A, Cobos M, Lopez J J. Evaluating the influence of source separation methods in robust automatic speech recognition with a specific cocktail-party training. Audio Engineering Society Convention, 2012. https://secure.aes.org/forum/pubs/conventions/?elib=16273, Mar. 2014.
Levis J, Suvorov R. Automatic speech recognition. In The Encyclopedia of Applied Linguistics, Chapelle C A (ed.), Blackwell Publishing Ltd., 2012.
Feng J, Ramabhadran B, Hansen J H L, Williams J D. Trends in speech and language processing. IEEE Signal Processing Magazine, 2012, 29(1): 177–179.
Article Google Scholar
Talking N Y. In the news. IEEE Intelligent Systems, 2012, 27(2): 2–7.
Article Google Scholar
Kim C, Seo K D. Robust DTW-based recognition algorithm for hand-held consumer devices. IEEE Trans. Consumer Electronics, 2005, 51(2): 699–709.
Article MathSciNet Google Scholar
Vintsyuk T K. Speech discrimination by dynamic programming. Cybernetics, 1968, 4(1): 52–57.
Article MathSciNet Google Scholar
Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics, Speech and Signal Processing, 1978, 26(1): 43–49.
Article MATH Google Scholar
Myers C, Rabiner L R, Rosenberg A. Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acoustics, Speech and Signal Processing, 1980, 28(6): 623–635.
Article MATH Google Scholar
Deller J R, Hansen J H L. Proakis J G. Discrete-Time Processing of Speech Signals. Wiley-IEEE Press, 1999.
Abdulla W H, Chow D, Sin G. Cross-words reference template for DTW-based speech recognition systems. In Proc. TENCON, Oct. 2003, pp.1576–1579.
Adami A G, Mihaescu R, Reynolds D A, Godfrey J J. Modeling prosodic dynamics for speaker recognition. In Proc. ICASSP, Apr. 2003, pp.788–791.
Nair N U, Sreenivas T V. Multi pattern dynamic time warping for automatic speech recognition. In Proc. TENCON, Nov. 2008, pp.1–6.
Muda L, Begam M, Elamvazuthi I. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. Journal of Computing, 2010, 2(3): 138–143.
Google Scholar
Sheikhan M, Gharavian D, Ashoftedel F. Using DTW neural-based MFCC warping to improve emotional speech recognition. Neural Computing & Applications, 2012, 21(7): 1765–1773.
Article Google Scholar
Wang J, Wang J, Mo M H, Tu C I, Lin S C. The design of a speech interactivity embedded module and its applications for mobile consumer devices. IEEE Trans. Consumer Electronics, 2008, 54(2): 870–876.
Article Google Scholar
Sun J, Sun Y, Abida K, Karray F. A novel template matching approach to speaker-independent Arabic spoken digit recognition. In Proc. AIS, June 2012, pp.192–199.
Berndt D J, Clifford J. Using dynamic time warping to find patterns in time series. In Proc. AAAI Workshop on Knowledge Discovery in Databases, July 1994, pp.359–370.
Keogh E J, Pazzani M J. Scaling up dynamic time warping to massive datasets. In Proc. the 3rd European Conf. PKDD, Sept. 1999, pp.1–11.
Müller M. Information Retrieval for Music and Motion. Heidelberg, New York: Springer-Verlag, 2007.
Kim S W, Park S, Chu W W. An index-based approach for similarity search supporting time warping in large sequence databases. In Proc. Int. Conf. Data Engineering, Apr. 2001, pp.607–614.
Zhu Y, Shasha D. Warping indexes with envelope transforms for query by humming. In Proc. SIGMOD, June 2003, pp.181–192.
Müller M, Mattes H, Kurth F. An efficient multiscale approach to audio synchronization. In Proc. the 7th ISMIR, Oct. 2006, pp.192-197.
Sakurai Y, Yoshikawa M, Faloutsos C. FTW: Fast similarity search under the time warping distance. In Proc. the 24th PODS, June 2005, pp.326–337.
Papapetrou P, Athitsos V, Potamias M, Kollios G, Gunopulos D. Embedding-based subsequence matching in time-series databases. ACM Trans. Database Systems, 2011, 36(3): Article No.17.
Shanker A P, Rajagopalan A N. Off-line signature verification using DTW. Pattern Recognition Letters, 2007, 28(12): 1407–1414.
Article Google Scholar
Jeong Y S, Jeong M K, Omitaomu O A. Weighted dynamic time warping for time series classification. Pattern Recognition, 2011, 44(9): 2231–2240.
Article Google Scholar
Karray F O, De Silva C. Soft Computing and Intelligent Systems Design: Theory, Tools and Applications. Addison-Wesley, 2004.
Keogh E. Exact indexing of dynamic time warping. In Proc. VLDB, Aug. 2002, pp.406–417.
Young S, Evermann G, Gales M et al. The HTK Book (for HTK Version 3.4). Cambridge, UK: Cambridge University Engineering Department, 2006.
Google Scholar
Livio M. The Golden Ratio: The Story of PHI, the World’s Most Astonishing Number. Broadway Books, 2003.
Lu A, Maciejewski R, Ebert D S, Volume composition using eye tracking data. In Proc. the 8th EuroVis, Jan. 2006,pp.115–122.
Rabiner L R, Juang B H. Fundamentals of Speech Recognition. Englewood Cliffs, New Jersey: Prentice-Hall, 1993.
Lévy C, Linarμes G, Nocera P. Comparison of several acoustic modeling techniques and decoding algorithms for embedded speech recognition systems. In Proc. Workshop on DSP in Mobile and Vehicular Systems, Apr. 2003.

Download references

Author information

Authors and Affiliations

State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
Xiang-Lilan Zhang
Science and Technology on Parallel and Distributed Processing Laboratory, School of Computer Science, National University of Defense Technology, Changsha, 410073, China
Xiang-Lilan Zhang & Zhi-Gang Luo
David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
Xiang-Lilan Zhang & Ming Li

Authors

Xiang-Lilan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Gang Luo
View author publications
You can also search for this author in PubMed Google Scholar
Ming Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhi-Gang Luo or Ming Li.

Additional information

This work was supported by the Research Plan Project of National University of Defense Technology under Grant No. JC13-06-01, and the OCRit Project made possible by the Global Leadership Round in Genomics&Life Sciences Grant (GL2).

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 128 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, XL., Luo, ZG. & Li, M. Merge-Weighted Dynamic Time Warping for Speech Recognition. J. Comput. Sci. Technol. 29, 1072–1082 (2014). https://doi.org/10.1007/s11390-014-1491-0

Download citation

Received: 26 September 2013
Revised: 24 March 2014
Published: 17 November 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s11390-014-1491-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Merge-Weighted Dynamic Time Warping for Speech Recognition

Abstract

Access this article

Similar content being viewed by others

A survey of methods for time series change point detection

A comprehensive survey on automatic speech recognition using neural networks

Transformers in Time-Series Analysis: A Tutorial

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Merge-Weighted Dynamic Time Warping for Speech Recognition

Abstract

Access this article

Similar content being viewed by others

A survey of methods for time series change point detection

A comprehensive survey on automatic speech recognition using neural networks

Transformers in Time-Series Analysis: A Tutorial

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation