A novel integration architecture for speech-based human–computer interaction was developed using a progressive growth framework and semantic-aware computing. The architecture can integrate different services and can address the diversity of Internet of Things platforms. A natural language understanding (NLU) agent is proposed as a controller of IoT hubs and hybrid cloud services. The NLU agent with semantic-aware computing can effectively achieve a context-sensitive topic correlation and user intent analysis. Through a modularized design, the proposed progressive growth framework allows the NLU agent to chat about many different issues, such as current affairs and music. Local and cloud services can be loaded based on user demands, such as IoT platforms and hybrid cloud services. We developed and introduced three applications in daily life as case studies to demonstrate their potential and values. With the proposed integration architecture, users can develop many valuable applications according to their demands in various industries.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Gates B, Myhrvold N, Rinearson P, Domonkos D (1995) The road ahead. Viking Penguin, NewYork
Haeb-Umbach R, Watanabe S, Nakatani T, Bacchiani M, Hoffmeister B, Seltzer ML, Souden M (2019) Speech processing for digital home assistants: combining signal processing with deep-learning techniques. IEEE Signal Process Mag 36(6):111–124
Hwang S (2018) Would satisfaction with smart speakers transfer into loyalty towards the smart speaker provider? 22nd ITS Biennial Conference, Seoul 2018. Beyond the boundaries: Challenges for business, policy and society: 190336
Ammari T, Kaye J, Tsai JY, Bentley F (2019) Music, search, and IoT: How people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact (TOCHI) 26(3):17–1
Ashton K (2009) That “Internet of Things” thing. RFID J 22(7):97–114
Ahmed E, Yaqoob I, Gani A, Imran M, Guizani M (2016) Internet-of-Things-based smart environments: state of the art, taxonomy, and open research challenges. IEEE Wireless Commun 23(5):10–16
Biggs P (Ed.) (2005) ITU Internet reports: The Internet of Things. International Telecommunication Union
Darwish D (2015) Improved layered architecture for Internet of Things. Int J Comput Acad Res (IJCAR) 4(4):214–223
Lee SK, Bae M, Kim H (2017) Future of IoT networks: a survey. Appl Sci 7(10):1072
Rahman A, Nasir MK, Rahman Z, Mosavi A, Shahab S, Minaei-Bidgoli B (2020) Distblockbuilding: A distributed blockchain-based SDN-IoT network for smart building management. IEEE Access 8:140008–140018
Burhan M, Rehman RA, Khan B, Kim BS (2018) IoT elements, layered architectures and security issues: a comprehensive survey. Sensors 18(9):2796
Aleksandrovičs V, Filičevs E, Kampars J (2016) Internet of Things: Structure, features and management. information technology and management science. 19:78–84
Hayashi V, Garcia V, Manzan de Andrade R, and Arakaki R (2020) OKIoT Open Knowledge IoT Project: Smart home case studies of Short-term Course and Software Residency Capstone Project. In Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - 1: IoTBDS, ISBN 978–989–758–426–8, 235–242. https://doi.org/10.5220/0009366002350242
Sudharsan B, Kumar SP, Dhakshinamurthy R (2019) AI Vision: Smart speaker design and implementation with object detection custom skill and advanced voice interaction capability. 11th International Conference on Advanced Computing (ICoAC):97–102. doi: https://doi.org/10.1109/ICoAC48765.2019.247125
Matarneh R, Maksymova S, Lyashenko V, Belova N (2017) Speech recognition systems: A comparative review. International Organization of Scientific Research Journal of Computer Engineering (IOSR-JCE). 19(5):71–79
Engleson S (2018) Smart speaker penetration hits 20% of U.S. Wi-Fi households. Retrieved January 24, 2021, from https://www.comscore.com/Insights/Blog/Smart-Speaker-Penetration-Hits-20-Percent-of-US-Wi-Fi-Households.
Bentley F, Luvogt C, Silverman M, Wirasinghe R, White B, Lottridge D (2018) Understanding the long-term use of smart speaker assistants. In Proc. ACM Interact. Mobile Wearable Ubiquitous Technol 2 (3) 1–24
Wu S, He S, Peng Y, Li W, Zhou M, Guan D (2019) An empirical study on expectation of relationship between human and smart devices—with smart speaker as an example. 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC):555–560. https://doi.org/10.1109/DSC.2019.00090.
Jung H, Oh C, Hwang G, Oh CY, Lee J, Suh B (2019) Tell me more: Understanding user interaction of smart speaker news powered by conversational search. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI EA '19):1–6
Sudharsan B, Corcoran P, Ali MI (2019) Smart speaker design and implementation with biometric authentication and advanced voice interaction capability. In Proceedings of Conference on Artificial Intelligence and Cognitive Science (AICS):305–316
Guo Y, Wang X, Wu C, Fu Q, Ma N, Brown GJ (2016) A robust dual-microphone speech source localization algorithm for reverberant environments. In Proceedings of the International Symposium on Computer Architecture (ISCA). INTERSPEECH:3354–3358
Ganguly A, Kucuk A, Panahi I (2017) Real-time smartphone implementation of noise-robust speech source localization algorithm for hearing aid users. In Proceedings of 2017 3rd Meetings of Acoustics Society of America and 8th Forum Acusticum 30(1):055002.
Pandey A, Wang D (2019) A new framework for CNN-based speech enhancement in the time domain. IEEE/ACM Transact on Audio, Sp, Lang Process 27(7):1179–1188
Donahue C, Li B, Prabhavalkar R (2018) Exploring speech enhancement with generative adversarial networks for robust speech recognition. In Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP):5024–5028
Macartney C, Weyde T (2018) Improved speech enhancement with the wave-u-net. In Proceedings of 32nd Conference on Neural Information Processing Systems (NIPS)
Gardner WG (2002) Reverberation algorithms. In: Kahrs M, Brandenburg K (eds) Applications of digital signal processing to audio and acoustics. Springer, Boston, pp 85–131
Mun H, Lee H, Kim S, Lee Y (2020). A smart speaker performance measurement tool. In Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC '20):755–762.
Wang D, Wang X, Lv S (2019) An overview of end-to-end automatic speech recognition. Symmetry 11(8):1018
Saliha B, Youssef E, Abdeslam D (2019) A study on automatic speech recognition. J Informat Technol Rev 10(3):77–85
Ibrahim H, Varol A (2020) A study on automatic speech recognition systems. In Proceedings of 2020 8th International Symposium on Digital Forensics and Security (ISDFS):1–5. IEEE
Benzeghiba M, De Mori R, Deroo O, Dupont S, Erbes T, Jouvet D, Wellekens C (2007) Automatic speech recognition and speech variability: a review. Speech Commun 49(10–11):763–786
Khurana D, Koli A, Khatter K, Singh S (2017) Natural language processing: State of the art, Current Trends and Challenges. arXiv preprint
Gatt A, Krahmer E (2018) Survey of the state of the art in natural language generation: core tasks, applications and evaluation. J Artif Intellig Res 61:65–170
Tran VK, Nguyen LM (2017) Neural-based natural language generation in dialogue using RNN encoder-decoder with semantic aggregation. In Proceedings of the 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL):231–240
Varshney D, Ekbal A, Nagaraja GP, Tiwari M, Gopinath AAM, Bhattacharyya P (2020) Natural language generation using transformer network in an open-domain setting. Nat Lang Process Informat Syst (NLDB) 12089:82–93
Martin FA, Malfaz M, Castro-González Á, Castillo JC, Salichs MÁ (2020) Four-features evaluation of text to speech systems for three social robots. Electronics 9(2):267. https://doi.org/10.3390/electronics9020267
Arık SÖ, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Shoeybi M (2017). Deep voice: Real-time neural text-to-speech. In Proceedings of International Conference on Machine Learning (PMLR):195–204
Isewon I, Oyelade OJ, Oladipupo OO (2012) Design and implementation of text to speech conversion for visually impaired people. Int J Appl Informat Syst (IJAIS) 7(2):26–30
Abdul-Kader SA, Woods JC (2015) Survey on chatbot design techniques in speech conversation systems. Int J Adv Comput Sci Appl 6(7):72–80
Dahiya M (2017) A tool of conversation: Chatbot. Int J Comput Sci Eng 5(5):158–161
Bocklisch T, Faulkner J, Pawlowski N, Nichol A (2017) Rasa: Open source language understanding and dialogue management. arXiv preprint
This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C. [grant number MOST 108-2218-E-025-002-MY3]. Special thanks to Mr. Yu-Ting Hsiao for his assistance in the development of programming for this study. In addition, special thanks to Miss Ching-Yi Chiou for her assistance in the proofreading of this paper.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Chang, JW. Enabling progressive system integration for AIoT and speech-based HCI through semantic-aware computing. J Supercomput (2021). https://doi.org/10.1007/s11227-021-03996-x
- Natural language processing
- Natural language understanding
- Speech-based HCI
- IoT hub
- Hybrid cloud services