Skip to main content

Enabling progressive system integration for AIoT and speech-based HCI through semantic-aware computing


A novel integration architecture for speech-based human–computer interaction was developed using a progressive growth framework and semantic-aware computing. The architecture can integrate different services and can address the diversity of Internet of Things platforms. A natural language understanding (NLU) agent is proposed as a controller of IoT hubs and hybrid cloud services. The NLU agent with semantic-aware computing can effectively achieve a context-sensitive topic correlation and user intent analysis. Through a modularized design, the proposed progressive growth framework allows the NLU agent to chat about many different issues, such as current affairs and music. Local and cloud services can be loaded based on user demands, such as IoT platforms and hybrid cloud services. We developed and introduced three applications in daily life as case studies to demonstrate their potential and values. With the proposed integration architecture, users can develop many valuable applications according to their demands in various industries.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39
Fig. 40
Fig. 41
Fig. 42
Fig. 43
Fig. 44


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

  6. 6.


  1. 1.

    Gates B, Myhrvold N, Rinearson P, Domonkos D (1995) The road ahead. Viking Penguin, NewYork

    Google Scholar 

  2. 2.

    Haeb-Umbach R, Watanabe S, Nakatani T, Bacchiani M, Hoffmeister B, Seltzer ML, Souden M (2019) Speech processing for digital home assistants: combining signal processing with deep-learning techniques. IEEE Signal Process Mag 36(6):111–124

    Article  Google Scholar 

  3. 3.

    Hwang S (2018) Would satisfaction with smart speakers transfer into loyalty towards the smart speaker provider? 22nd ITS Biennial Conference, Seoul 2018. Beyond the boundaries: Challenges for business, policy and society: 190336

  4. 4.

    Ammari T, Kaye J, Tsai JY, Bentley F (2019) Music, search, and IoT: How people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact (TOCHI) 26(3):17–1

  5. 5.

    Ashton K (2009) That “Internet of Things” thing. RFID J 22(7):97–114

    Google Scholar 

  6. 6.

    Ahmed E, Yaqoob I, Gani A, Imran M, Guizani M (2016) Internet-of-Things-based smart environments: state of the art, taxonomy, and open research challenges. IEEE Wireless Commun 23(5):10–16

    Article  Google Scholar 

  7. 7.

    Biggs P (Ed.) (2005) ITU Internet reports: The Internet of Things. International Telecommunication Union

  8. 8.

    Darwish D (2015) Improved layered architecture for Internet of Things. Int J Comput Acad Res (IJCAR) 4(4):214–223

    Google Scholar 

  9. 9.

    Lee SK, Bae M, Kim H (2017) Future of IoT networks: a survey. Appl Sci 7(10):1072

    Article  Google Scholar 

  10. 10.

    Rahman A, Nasir MK, Rahman Z, Mosavi A, Shahab S, Minaei-Bidgoli B (2020) Distblockbuilding: A distributed blockchain-based SDN-IoT network for smart building management. IEEE Access 8:140008–140018

    Article  Google Scholar 

  11. 11.

    Burhan M, Rehman RA, Khan B, Kim BS (2018) IoT elements, layered architectures and security issues: a comprehensive survey. Sensors 18(9):2796

    Article  Google Scholar 

  12. 12.

    Aleksandrovičs V, Filičevs E, Kampars J (2016) Internet of Things: Structure, features and management. information technology and management science. 19:78–84

  13. 13.

    Hayashi V, Garcia V, Manzan de Andrade R, and Arakaki R (2020) OKIoT Open Knowledge IoT Project: Smart home case studies of Short-term Course and Software Residency Capstone Project. In Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - 1: IoTBDS, ISBN 978–989–758–426–8, 235–242.

  14. 14.

    Sudharsan B, Kumar SP, Dhakshinamurthy R (2019) AI Vision: Smart speaker design and implementation with object detection custom skill and advanced voice interaction capability. 11th International Conference on Advanced Computing (ICoAC):97–102. doi:

  15. 15.

    Matarneh R, Maksymova S, Lyashenko V, Belova N (2017) Speech recognition systems: A comparative review. International Organization of Scientific Research Journal of Computer Engineering (IOSR-JCE). 19(5):71–79

  16. 16.

    Engleson S (2018) Smart speaker penetration hits 20% of U.S. Wi-Fi households. Retrieved January 24, 2021, from

  17. 17.

    Bentley F, Luvogt C, Silverman M, Wirasinghe R, White B, Lottridge D (2018) Understanding the long-term use of smart speaker assistants. In Proc. ACM Interact. Mobile Wearable Ubiquitous Technol 2 (3) 1–24

  18. 18.

    Wu S, He S, Peng Y, Li W, Zhou M, Guan D (2019) An empirical study on expectation of relationship between human and smart devices—with smart speaker as an example. 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC):555–560.

  19. 19.

    Jung H, Oh C, Hwang G, Oh CY, Lee J, Suh B (2019) Tell me more: Understanding user interaction of smart speaker news powered by conversational search. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI EA '19):1–6

  20. 20.

    Sudharsan B, Corcoran P, Ali MI (2019) Smart speaker design and implementation with biometric authentication and advanced voice interaction capability. In Proceedings of Conference on Artificial Intelligence and Cognitive Science (AICS):305–316

  21. 21.

    Guo Y, Wang X, Wu C, Fu Q, Ma N, Brown GJ (2016) A robust dual-microphone speech source localization algorithm for reverberant environments. In Proceedings of the International Symposium on Computer Architecture (ISCA). INTERSPEECH:3354–3358

  22. 22.

    Ganguly A, Kucuk A, Panahi I (2017) Real-time smartphone implementation of noise-robust speech source localization algorithm for hearing aid users. In Proceedings of 2017 3rd Meetings of Acoustics Society of America and 8th Forum Acusticum 30(1):055002.

  23. 23.

    Pandey A, Wang D (2019) A new framework for CNN-based speech enhancement in the time domain. IEEE/ACM Transact on Audio, Sp, Lang Process 27(7):1179–1188

    Article  Google Scholar 

  24. 24.

    Donahue C, Li B, Prabhavalkar R (2018) Exploring speech enhancement with generative adversarial networks for robust speech recognition. In Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP):5024–5028

  25. 25.

    Macartney C, Weyde T (2018) Improved speech enhancement with the wave-u-net. In Proceedings of 32nd Conference on Neural Information Processing Systems (NIPS)

  26. 26.

    Gardner WG (2002) Reverberation algorithms. In: Kahrs M, Brandenburg K (eds) Applications of digital signal processing to audio and acoustics. Springer, Boston, pp 85–131

    Chapter  Google Scholar 

  27. 27.

    Mun H, Lee H, Kim S, Lee Y (2020). A smart speaker performance measurement tool. In Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC '20):755–762.

  28. 28.

    Wang D, Wang X, Lv S (2019) An overview of end-to-end automatic speech recognition. Symmetry 11(8):1018

    Article  Google Scholar 

  29. 29.

    Saliha B, Youssef E, Abdeslam D (2019) A study on automatic speech recognition. J Informat Technol Rev 10(3):77–85

    Google Scholar 

  30. 30.

    Ibrahim H, Varol A (2020) A study on automatic speech recognition systems. In Proceedings of 2020 8th International Symposium on Digital Forensics and Security (ISDFS):1–5. IEEE

  31. 31.

    Benzeghiba M, De Mori R, Deroo O, Dupont S, Erbes T, Jouvet D, Wellekens C (2007) Automatic speech recognition and speech variability: a review. Speech Commun 49(10–11):763–786

    Article  Google Scholar 

  32. 32.

    Khurana D, Koli A, Khatter K, Singh S (2017) Natural language processing: State of the art, Current Trends and Challenges. arXiv preprint

  33. 33.

    Gatt A, Krahmer E (2018) Survey of the state of the art in natural language generation: core tasks, applications and evaluation. J Artif Intellig Res 61:65–170

    MathSciNet  Article  Google Scholar 

  34. 34.

    Tran VK, Nguyen LM (2017) Neural-based natural language generation in dialogue using RNN encoder-decoder with semantic aggregation. In Proceedings of the 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL):231–240

  35. 35.

    Varshney D, Ekbal A, Nagaraja GP, Tiwari M, Gopinath AAM, Bhattacharyya P (2020) Natural language generation using transformer network in an open-domain setting. Nat Lang Process Informat Syst (NLDB) 12089:82–93

    Article  Google Scholar 

  36. 36.

    Martin FA, Malfaz M, Castro-González Á, Castillo JC, Salichs MÁ (2020) Four-features evaluation of text to speech systems for three social robots. Electronics 9(2):267.

    Article  Google Scholar 

  37. 37.

    Arık SÖ, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Shoeybi M (2017). Deep voice: Real-time neural text-to-speech. In Proceedings of International Conference on Machine Learning (PMLR):195–204

  38. 38.

    Isewon I, Oyelade OJ, Oladipupo OO (2012) Design and implementation of text to speech conversion for visually impaired people. Int J Appl Informat Syst (IJAIS) 7(2):26–30

    Google Scholar 

  39. 39.

    Abdul-Kader SA, Woods JC (2015) Survey on chatbot design techniques in speech conversation systems. Int J Adv Comput Sci Appl 6(7):72–80

    Google Scholar 

  40. 40.

    Dahiya M (2017) A tool of conversation: Chatbot. Int J Comput Sci Eng 5(5):158–161

    Google Scholar 

  41. 41.

    Bocklisch T, Faulkner J, Pawlowski N, Nichol A (2017) Rasa: Open source language understanding and dialogue management. arXiv preprint

Download references


This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C. [grant number MOST 108-2218-E-025-002-MY3]. Special thanks to Mr. Yu-Ting Hsiao for his assistance in the development of programming for this study. In addition, special thanks to Miss Ching-Yi Chiou for her assistance in the proofreading of this paper.

Author information



Corresponding author

Correspondence to Jia-Wei Chang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chang, JW. Enabling progressive system integration for AIoT and speech-based HCI through semantic-aware computing. J Supercomput (2021).

Download citation


  • Natural language processing
  • Natural language understanding
  • Speech-based HCI
  • IoT hub
  • Hybrid cloud services