Skip to main content
Log in

To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The assessment to a trained speech controller with deep neural network long-short term memory (DNN-LSTM) framework adopted as the commander to control a smart wheeled-robot is implemented in the article. Accordingly, the deployment is implemented in recognition to control remotely a smart wheeled-robot which has been completed previously by other project of authors’ research group. Based on the machine learning skill a framework established with the DNN-LSTM model is embedded into the smart wheeled-robot prototype. Apart from, the control commands are designed over limited learning circumstance where constrained single-track (ST) and double-track (DT) speeches, and only are including 4 Chinese speech commands, “forward” (Chinese”前進”), “backward” (Chinese”後退”), “turn left” (Chinese”左轉”), and “turn right” (Chinese”右轉”). Though, there are just 4 simple speeches collected for data training, the investigation to the accurate ratio is deployed in 3 separated persons training work and each with 1000 to 5000 training times. There are just three parameters (this why “Limited Learning Circumstance” is referred as the article name) considered as the dominators for the performance evaluation of the speech controlled wheeled-robot. The results from the testing cases clearly show that the set with DT has the higher accurate comparison with the set of ST. The best outcomes form the performance of testing and validation happens at the case of DT channel, hereafter, the accurate and loss rate are obtained as 0.673 and 0.018 with 50% dropout, respectively. However, the ratio of dropout has been discovered definitely to dominate the accurate and loss rate when it is deployed during the process of training step. Eventually, the trained and developed model of speech command sets are uploaded into a micro-controller after accuracy analyzed, and embedded into the smart wheeled-robot plays as remotely pilot scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (pp. 41–48).

  • Chen, J.I.-Z. (2020). The implementation to intelligent linkage service over AIoT hierarchical for material flow management. Journal of Ambient Intelligence and Humanized Computing, 12(2), 2207.

    Article  Google Scholar 

  • Chen Du, J., Lin, G., He, Y., Xu, R., & Wang, X. (2019). Convolution-based neural attention with applications to sentiment classification. IEEE Access, 7, 27983–27992.

    Article  Google Scholar 

  • Cheng, K., Yue, Y., & Song, Z. (2020). Sentiment classification based on part-of-speech and self-attention mechanism. IEEE Access, 8, 16387–16396.

    Article  Google Scholar 

  • Chunwijitra, S., Boonkla, S., Chunwijitra, V., Kurpukdee, N., Sertsi, P., & Kasuriya, S. (2019). Distributing and sharing resources for automatic speech recognition applications. In The 22nd Conference of the Oriental COCOSDA, Cebu, Philippibes, October 25–27.

  • Daume´ III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association Computational Linguistics (pp. 256–263).

  • Deng, L. (2006). Dynamic speech models theory, algorithms, and applications. Redmond: Microsoft Research.

    Book  Google Scholar 

  • Eyben, F. (2015). Real-time speech and music classification by large audio feature space extraction. Doctoral Thesis accepted by the Technische Universität München, Germany. https://doi.org/10.1007/978-3-319-27299-3

  • Lostanlen, V., & Cella, C.-E. (2016). Deep convolutional networks on the pitch spiral for musical instrument recognition. In 17th International Society for Music Information Retrieval Conference (pp. 1–7).

  • Mohamed, A.-R., Hinton, G., & Penn, G. (2012). Understanding how deep belief networks perform acoustic modelling. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4273–4276).

  • Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1358.

    Article  Google Scholar 

  • Park, Y.-S., Song, J.-H., & Kang, S.-I., Lee, W., & Chang, J.-H. (2010). A statistical model-based double-talk detection incorporating soft decision. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5082–5085).

  • Ranganathan, G. (2021). A study to find facts behind preprocessing on deeplearning algorithms. Journal of Innovative Image Processing (JIIP), 3(01), 66–74.

    Article  Google Scholar 

  • Raspberry pi foundation. (2014). [online]. Retrieved from https://www.raspberrypi.com.tw/

  • Stan, A. (2019). Input encoding for sequence-to-sequence learning of Romanian grapheme-to-phoneme conversion. In 2019 IEEE International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, October 10–12.

  • Sengupta, A., Jin, F., & Cao, S. (2019). A DNN-LSTM based target tracking approach using mm wave radar and camera sensor fusion. In 2019 IEEE National Aerospace and Electronics Conference (NAECON) (pp. 688–693).

  • Sohn, J., Kim, N., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6, 1–3.

    Article  Google Scholar 

  • Tachioka, Y. (2018). DNN-based voice activity detection using auxiliary speech models in noisy environments. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5529–5533).

  • Tesfamikael, H. H., Fray, A., Mengsteab, I., Semere, A., & Amanuel, Z. (2021). Construction of mathematical model of DC servo motor mechanism with PID controller for electric wheel chair arrangement. Journal of Electronics, 3(01), 49–60.

    Google Scholar 

  • Zhao, M., Yan, L., & Chen, J. (2021). LSTM-DNN based autoencoder network for nonlinear hyperspectral image unmixing. IEEE Journal of Selected Topics in Signal Processing, 15(2), 295–309.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joy Iong-Zong Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuo, ZP., Chen, J.IZ. To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance. Int J Speech Technol 25, 879–891 (2022). https://doi.org/10.1007/s10772-022-09962-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-022-09962-z

Keywords

Navigation