Abstract
The assessment to a trained speech controller with deep neural network long-short term memory (DNN-LSTM) framework adopted as the commander to control a smart wheeled-robot is implemented in the article. Accordingly, the deployment is implemented in recognition to control remotely a smart wheeled-robot which has been completed previously by other project of authors’ research group. Based on the machine learning skill a framework established with the DNN-LSTM model is embedded into the smart wheeled-robot prototype. Apart from, the control commands are designed over limited learning circumstance where constrained single-track (ST) and double-track (DT) speeches, and only are including 4 Chinese speech commands, “forward” (Chinese”前進”), “backward” (Chinese”後退”), “turn left” (Chinese”左轉”), and “turn right” (Chinese”右轉”). Though, there are just 4 simple speeches collected for data training, the investigation to the accurate ratio is deployed in 3 separated persons training work and each with 1000 to 5000 training times. There are just three parameters (this why “Limited Learning Circumstance” is referred as the article name) considered as the dominators for the performance evaluation of the speech controlled wheeled-robot. The results from the testing cases clearly show that the set with DT has the higher accurate comparison with the set of ST. The best outcomes form the performance of testing and validation happens at the case of DT channel, hereafter, the accurate and loss rate are obtained as 0.673 and 0.018 with 50% dropout, respectively. However, the ratio of dropout has been discovered definitely to dominate the accurate and loss rate when it is deployed during the process of training step. Eventually, the trained and developed model of speech command sets are uploaded into a micro-controller after accuracy analyzed, and embedded into the smart wheeled-robot plays as remotely pilot scheme.
Similar content being viewed by others
References
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (pp. 41–48).
Chen, J.I.-Z. (2020). The implementation to intelligent linkage service over AIoT hierarchical for material flow management. Journal of Ambient Intelligence and Humanized Computing, 12(2), 2207.
Chen Du, J., Lin, G., He, Y., Xu, R., & Wang, X. (2019). Convolution-based neural attention with applications to sentiment classification. IEEE Access, 7, 27983–27992.
Cheng, K., Yue, Y., & Song, Z. (2020). Sentiment classification based on part-of-speech and self-attention mechanism. IEEE Access, 8, 16387–16396.
Chunwijitra, S., Boonkla, S., Chunwijitra, V., Kurpukdee, N., Sertsi, P., & Kasuriya, S. (2019). Distributing and sharing resources for automatic speech recognition applications. In The 22nd Conference of the Oriental COCOSDA, Cebu, Philippibes, October 25–27.
Daume´ III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association Computational Linguistics (pp. 256–263).
Deng, L. (2006). Dynamic speech models theory, algorithms, and applications. Redmond: Microsoft Research.
Eyben, F. (2015). Real-time speech and music classification by large audio feature space extraction. Doctoral Thesis accepted by the Technische Universität München, Germany. https://doi.org/10.1007/978-3-319-27299-3
Lostanlen, V., & Cella, C.-E. (2016). Deep convolutional networks on the pitch spiral for musical instrument recognition. In 17th International Society for Music Information Retrieval Conference (pp. 1–7).
Mohamed, A.-R., Hinton, G., & Penn, G. (2012). Understanding how deep belief networks perform acoustic modelling. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4273–4276).
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1358.
Park, Y.-S., Song, J.-H., & Kang, S.-I., Lee, W., & Chang, J.-H. (2010). A statistical model-based double-talk detection incorporating soft decision. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5082–5085).
Ranganathan, G. (2021). A study to find facts behind preprocessing on deeplearning algorithms. Journal of Innovative Image Processing (JIIP), 3(01), 66–74.
Raspberry pi foundation. (2014). [online]. Retrieved from https://www.raspberrypi.com.tw/
Stan, A. (2019). Input encoding for sequence-to-sequence learning of Romanian grapheme-to-phoneme conversion. In 2019 IEEE International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, October 10–12.
Sengupta, A., Jin, F., & Cao, S. (2019). A DNN-LSTM based target tracking approach using mm wave radar and camera sensor fusion. In 2019 IEEE National Aerospace and Electronics Conference (NAECON) (pp. 688–693).
Sohn, J., Kim, N., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6, 1–3.
Tachioka, Y. (2018). DNN-based voice activity detection using auxiliary speech models in noisy environments. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5529–5533).
Tesfamikael, H. H., Fray, A., Mengsteab, I., Semere, A., & Amanuel, Z. (2021). Construction of mathematical model of DC servo motor mechanism with PID controller for electric wheel chair arrangement. Journal of Electronics, 3(01), 49–60.
Zhao, M., Yan, L., & Chen, J. (2021). LSTM-DNN based autoencoder network for nonlinear hyperspectral image unmixing. IEEE Journal of Selected Topics in Signal Processing, 15(2), 295–309.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kuo, ZP., Chen, J.IZ. To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance. Int J Speech Technol 25, 879–891 (2022). https://doi.org/10.1007/s10772-022-09962-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-022-09962-z