To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance

Kuo, Zong-Peng; Chen, Joy Iong-Zong

doi:10.1007/s10772-022-09962-z

To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance

Published: 08 February 2022

Volume 25, pages 879–891, (2022)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

151 Accesses
1 Citation
Explore all metrics

Abstract

The assessment to a trained speech controller with deep neural network long-short term memory (DNN-LSTM) framework adopted as the commander to control a smart wheeled-robot is implemented in the article. Accordingly, the deployment is implemented in recognition to control remotely a smart wheeled-robot which has been completed previously by other project of authors’ research group. Based on the machine learning skill a framework established with the DNN-LSTM model is embedded into the smart wheeled-robot prototype. Apart from, the control commands are designed over limited learning circumstance where constrained single-track (ST) and double-track (DT) speeches, and only are including 4 Chinese speech commands, “forward” (Chinese”前進”), “backward” (Chinese”後退”), “turn left” (Chinese”左轉”), and “turn right” (Chinese”右轉”). Though, there are just 4 simple speeches collected for data training, the investigation to the accurate ratio is deployed in 3 separated persons training work and each with 1000 to 5000 training times. There are just three parameters (this why “Limited Learning Circumstance” is referred as the article name) considered as the dominators for the performance evaluation of the speech controlled wheeled-robot. The results from the testing cases clearly show that the set with DT has the higher accurate comparison with the set of ST. The best outcomes form the performance of testing and validation happens at the case of DT channel, hereafter, the accurate and loss rate are obtained as 0.673 and 0.018 with 50% dropout, respectively. However, the ratio of dropout has been discovered definitely to dominate the accurate and loss rate when it is deployed during the process of training step. Eventually, the trained and developed model of speech command sets are uploaded into a micro-controller after accuracy analyzed, and embedded into the smart wheeled-robot plays as remotely pilot scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Deep learning: systematic review, models, challenges, and research directions

Article Open access 07 September 2023

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

References

Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (pp. 41–48).
Chen, J.I.-Z. (2020). The implementation to intelligent linkage service over AIoT hierarchical for material flow management. Journal of Ambient Intelligence and Humanized Computing, 12(2), 2207.
Article Google Scholar
Chen Du, J., Lin, G., He, Y., Xu, R., & Wang, X. (2019). Convolution-based neural attention with applications to sentiment classification. IEEE Access, 7, 27983–27992.
Article Google Scholar
Cheng, K., Yue, Y., & Song, Z. (2020). Sentiment classification based on part-of-speech and self-attention mechanism. IEEE Access, 8, 16387–16396.
Article Google Scholar
Chunwijitra, S., Boonkla, S., Chunwijitra, V., Kurpukdee, N., Sertsi, P., & Kasuriya, S. (2019). Distributing and sharing resources for automatic speech recognition applications. In The 22nd Conference of the Oriental COCOSDA, Cebu, Philippibes, October 25–27.
Daume´ III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association Computational Linguistics (pp. 256–263).
Deng, L. (2006). Dynamic speech models theory, algorithms, and applications. Redmond: Microsoft Research.
Book Google Scholar
Eyben, F. (2015). Real-time speech and music classification by large audio feature space extraction. Doctoral Thesis accepted by the Technische Universität München, Germany. https://doi.org/10.1007/978-3-319-27299-3
Lostanlen, V., & Cella, C.-E. (2016). Deep convolutional networks on the pitch spiral for musical instrument recognition. In 17th International Society for Music Information Retrieval Conference (pp. 1–7).
Mohamed, A.-R., Hinton, G., & Penn, G. (2012). Understanding how deep belief networks perform acoustic modelling. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4273–4276).
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1358.
Article Google Scholar
Park, Y.-S., Song, J.-H., & Kang, S.-I., Lee, W., & Chang, J.-H. (2010). A statistical model-based double-talk detection incorporating soft decision. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5082–5085).
Ranganathan, G. (2021). A study to find facts behind preprocessing on deeplearning algorithms. Journal of Innovative Image Processing (JIIP), 3(01), 66–74.
Article Google Scholar
Raspberry pi foundation. (2014). [online]. Retrieved from https://www.raspberrypi.com.tw/
Stan, A. (2019). Input encoding for sequence-to-sequence learning of Romanian grapheme-to-phoneme conversion. In 2019 IEEE International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, October 10–12.
Sengupta, A., Jin, F., & Cao, S. (2019). A DNN-LSTM based target tracking approach using mm wave radar and camera sensor fusion. In 2019 IEEE National Aerospace and Electronics Conference (NAECON) (pp. 688–693).
Sohn, J., Kim, N., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6, 1–3.
Article Google Scholar
Tachioka, Y. (2018). DNN-based voice activity detection using auxiliary speech models in noisy environments. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5529–5533).
Tesfamikael, H. H., Fray, A., Mengsteab, I., Semere, A., & Amanuel, Z. (2021). Construction of mathematical model of DC servo motor mechanism with PID controller for electric wheel chair arrangement. Journal of Electronics, 3(01), 49–60.
Google Scholar
Zhao, M., Yan, L., & Chen, J. (2021). LSTM-DNN based autoencoder network for nonlinear hyperspectral image unmixing. IEEE Journal of Selected Topics in Signal Processing, 15(2), 295–309.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Da-Yeh University, 168 University, Rd., Dasuen, Changhua, 51505, Taiwan
Zong-Peng Kuo & Joy Iong-Zong Chen

Authors

Zong-Peng Kuo
View author publications
You can also search for this author in PubMed Google Scholar
Joy Iong-Zong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joy Iong-Zong Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuo, ZP., Chen, J.IZ. To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance. Int J Speech Technol 25, 879–891 (2022). https://doi.org/10.1007/s10772-022-09962-z

Download citation

Received: 18 June 2021
Accepted: 24 December 2021
Published: 08 February 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10772-022-09962-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance

Abstract

Access this article

Similar content being viewed by others

Deep learning: systematic review, models, challenges, and research directions

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance

Abstract

Access this article

Similar content being viewed by others

Deep learning: systematic review, models, challenges, and research directions

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation