Speech to speech interaction system using Multimedia Tools and Partially Observable Markov Decision Process for visually impaired students

Lokesh, S.; Kanisha, B.; Nalini, S.; Ramya Devi, M.; Kumar, R.

doi:10.1007/s11042-018-6264-2

Speech to speech interaction system using Multimedia Tools and Partially Observable Markov Decision Process for visually impaired students

Published: 23 June 2018

Volume 79, pages 5023–5042, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

S. Lokesh¹,
B. Kanisha²,
S. Nalini³,
M. Ramya Devi⁴ &
…
R. Kumar⁵

471 Accesses
4 Citations
Explore all metrics

Abstract

In general, visually impaired students need of another person’s to teach them with the help of computers and book. However, a number of students are not aware of using the computers and understanding the concepts by self. In order to solve this issue, a speech to speech interaction system is developed on the basis of a novel dialogue management system. This interaction is developed by combining Multimedia tools and Partially Observable Markov Decision Process (POMDP) with agenda based model used in the proposed dialogue management system to learn the speech signals from user and system will reply accordingly. The proposed system helps visually impaired students to learn easily using a novel dialogue management system. Word Error Rate, Recognition cum retrieval rate and Misrecognition Retrieval Rate are calculated for the proposed POMDP with Agenda Based dialogue management system. The experimental results are compared with Finite-State Based dialogue management system, Frame Based dialogue management system, and Probabilistic dialogue management system. The experimental results proved that the good performance of the proposed POMDP with Agenda Based dialogue management system. The proposed model is trained with 125 speakers out of which 46 were visually impaired and tested with 95 untrained speakers out of which 32 are visually impaired.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Early dementia detection with speech analysis and machine learning techniques

Article Open access 11 April 2024

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

Article 11 January 2021

References

Aida-zade K, Rustamov S, Mustafayev E, Aliyeva N (2012) Humancomputer dialogue understanding hybrid system. Presented at the Innovations in Intelligent Systems and Applications (INISTA), 2012 International Symposium on, Trabzon, pp 1–5
Alexandersson J, Aretoulaki M, Campbell N, Gardner M, Girenko A, Klakow D, Koryzis D, Petukhova V, Specht M, Spiliotopoulos D, Stricker A, Taatgen N (2014) Metalogue: a multiperspective multimodal dialogue system with metacognitive abilities for highly adaptive and flexible dialogue management, pp 365–368
Banchs RE, Li H (2012) IRIS: a chat-oriented dialogue system based on the vector space model. In: Proceedings of the ACL 2012 system demonstrations, pp 37–42
Baumann T, Kennington C, Hough J, Schlangen D (2017) Recognising conversational speech: what an incremental asr should do for a dialogue system and how to get there. In: Dialogues with social robots. Springer, Singapore, pp 421–432
Chapter Google Scholar
Bokaei MH, Sameti H, Eghbal-zadeh H, BabaAli B, Hosseinzadeh KH, Bahrani M, Veisi H, Sanian A (2010) Niusha, the first Persian speech-enabled IVR platform. In: Telecommunications (IST), 2010 5th international symposium on, pp 591–595
Budkov VY, Prischepa MV, Ronzhin AL, Karpov AA (2010) Multimodal human-robot interaction. In: Ultra modern telecommunications and control systems and workshops (ICUMT), 2010 international congress on, pp 485–488
Bui T, Poel M, Nijholt A, Zwiers J (2009) A tractable hybrid DDN-POMDP approach to affective dialogue modeling for probabilistic frame-based dialogue systems. Nat Lang Eng 15(2):273–307
Article Google Scholar
Cavazza M, De La Cámara RS, Turunen M, Gil JR, Hakulinen J, Crook N, Field D (2010) ‘How was your day?’: an affective companion ECA prototype. In: Proceedings of the 11th annual meeting of the special interest group on discourse and dialogue, pp 277–280
Celikyilmaz A, Hakkani-Tur D, Tur G (2012) Statistical semantic interpretation modeling for spoken language understanding with enriched semantic features. In: Spoken language technology workshop (SLT), 2012 IEEE, pp 216–221
Cortana (software) - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Cortana_(software). Accessed 30 Apr 2016
Di Lecce V, Calabrese M, Soldo D, Quarto A Dialogueoriented interface for linguistic human-computer interaction: a chatbased application. Presented at the 2010 IEEE international conference on virtual environments, human-computer interfaces and measurement systems, taranto, pp. 103–108
Dinarelli M, Stepanov EA, Varges S, Riccardi G (2010) The LUNA spoken dialogue system: beyond utterance classification. In: ICASSP, pp 5366–5369
Doshi F, Roy N (2007) Efficient model learning for dialog management. In: Proceedings of the ACM/IEEE international conference on human-robot interaction. ACM, pp 65–72. ISBN 978-1- 59593-617-2
Dzikovska MO, Moore JD, Steinhauser N, Campbell G, Farrow E, Callaway CB (2010) Beetle II: a system for tutoring and computational linguistics experimentation. In: Proceedings of the ACL 2010 system demonstrations, pp 13–18
Dzikovska MO, Isard A, Bell P, Moore JD, Steinhauser N, Campbell G (2011) BEETLE II: an adaptable tutorial dialogue system. In: Proceedings of the SIGDIAL 2011 conference, pp 338–340
Ferrucci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur AA, Lally A, Murdock JW, Nyberg E, Prager J, others (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79
Article Google Scholar
Galescu L, Allen J, Ferguson G, Quinn J, Swift M (2009) Speech recognition in a dialog system for patient health monitoring
Galibert O, Illouz G, Rosset S (2005) Ritel: an open-domain, humancomputer dialog system. In: Interspeech, pp 909–912
Google Now - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Google_Now. Accessed 30 Apr 2016
Hastie H, Aufaure M-A, Alexopoulos P, Cuayáhuitl H, Dethlefs N, Gasic M, Henderson J, Lemon O, Liu X, Mika P, others (2013) Demonstration of the parlance system: a data-driven, incremental, spoken dialogue system for interactive search. In: Proceedings of the SIGDIAL 2013 conference, pp 154–156
Henderson J, Lemon O, Georgila K (2005) Hybrid reinforcement/supervised learning for dialogue policies from communicator data. In: IJCAI workshop on knowledge and reasoning in practical dialogue systems, pp 68–75
Hsieh M-C, Hung W-S, Lin S-W, Luo C-H (2009) Designing an assistive dialog agent for a case of spinal cord injury, pp 67–72
Hung V, Gonzalez A, DeMara R (2009) Towards a context-based dialog management layer for expert systems, pp 60–65
Jokinen K, Wilcock G (2011) Emergent verbal behaviour in humanrobot interaction. InL Cognitive Infocommunications (CogInfoCom), 2011 2nd international conference on, pp 1–4
Kanisha B, Lokesh S, Kumar PM et al (2018) Speech recognition with improved support vector machine using dual classifiers and cross fitness validation. Pers Ubiquit Comput. https://doi.org/10.1007/s00779-018-1139-0
Article Google Scholar
Karpov A, Ronzhin A, Kipyatkova I, Ronzhin A, Akarun L (2010) Multimodal human computer interaction with MIDAS intelligent infokiosk, pp 3862–3865
Kim D, Sim HS, Kim KE, Kim JH, Kim H, Sung JW (2008) Effects of user modeling on POMDP based dialogue systems. In: Proceedings of interspeech
Lee C, Cha Y-S, Kuc T-Y (2008) Implementation of dialogue system for intelligent service robots. In: Control, automation and systems, 2008. ICCAS 2008. International conference on, pp 2038–2042
Lefevre F, Gasic M, Jurcicek F, Keizer S, Mairesse F, Thomson B, Yu K, Young S (2009) k-nearest neighbor Monte-Carlo control algorithm for POMDP-based dialogue systems. In: Proceedings of SIGDIAL
Lemaignan S, Ros R, Alami R, Beetz M (2011) What are you talking about? Grounding dialogue in a perspective-aware robotic architecture. In: RO-MAN, 2011 IEEE, pp 107–112
Li L, Williams JD, Balakrishnan S (2009) Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection In: Proceedings of interspeech
Liu J, Cyphers S, Pasupat P, McGraw I, Glass JR (2012) A conversational movie search system based on conditional random fields. In: INTERSPEECH, pp 2454–2457
Lokesh S, Balakrishnan G (2012) Speech enhancement using mel-LPC cepstrum and vector quantization for ASR. Eur J Sci Res 73(2):202–209
Google Scholar
Lokesh S, Balakrishnan G (2012) Robust speech feature prediction using Mel-LPC to improve recognition accuracy. Inf Technol J 11(11):1644–1699
Article Google Scholar
Lokesh S, Devi MR (2017) Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method. Clust Comput. https://doi.org/10.1007/s10586-017-1447-6. Springer
Article Google Scholar
Lokesh S, Malarvizhi Kumar P, Ramya Devi M et al (2018) An automatic Tamil speech recognition system by using bidirectional recurrent neural network with self-organizing map. Neural Computing & Applications. https://doi.org/10.1007/s00521-018-3466-5
Article Google Scholar
Mantena GV, Rajendran S, Rambabu B, Gangashetty SV, Yegnanarayana B, Prahallad K (2011) A speech-based conversation system for accessing agriculture commodity prices in Indian languages. In: Hands-free speech communication and microphone arrays (HSCMA), 2011 joint workshop on, pp 153–154
Mantena GV, Rajendran S, Gangashetty SV, Prahallad K (2011) Development of a spoken dialogue system for accessing agricultural information in Telugu. In: Proceedings of ICON-2011, 9^th international conference on natural language processing
Morbini F, Forbell E, DeVault D, Sagae K, Traum DR, Rizzo AA (2012) A mixed-initiative conversational dialogue system for healthcare. In: Proceedings of the 13th annual meeting of the special interest group on discourse and dialogue, pp 137–139
Peters J, Vijayakumar S, Schaal S (2005) Natural actor-critic. In: Proceedings of ECML. Springer, Heidelberg, pp 280–291
Google Scholar
Roy N, Pineau J, Thrun S (2000) Spoken dialogue management using probabilistic reasoning. In: Proceedings of ACL
Schwarzler S, Schenk J, Ruske G, Wallhoff F (2009) A multi-agent framework for a hybrid dialog management system. Presented at the IEEE international conference on multimedia and expo, New York, NY, pp 958–961
Selvaraj L, Ganesan B (2014) Enhancing speech recognition using improved particle swarm optimization based hidden Markov model. Sci World J. https://doi.org/10.1155/2014/270576
Article Google Scholar
Shahnawazuddin S, Thotappa D, Sarma BD, Deka A, Prasanna SRM, Sinha R (2013) Assamese spoken query system to access the price of agricultural commodities. In: Communications (NCC), 2013 National Conference on, pp 1–5
Sharma K, Haksar P (2012) Speech denoising using different types of filters. International Journal of Engineering Research and Applications 2(1):809–811
Google Scholar
Siri - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Siri. Accessed 30 Apr 2016
Thomson B, Schatzmann J, Young S (2008) Bayesian update of dialogue state for robust dialogue systems. In: Proceedings of ICASSP, pp 4937–4940
Ultes S, Barahona LMR, Su PH, Vandyke D, Kim D, Casanueva I, … Young S (2017) Pydial: a multi-domain statistical dialogue system toolkit. Proceedings of ACL 2017, system demonstrations, pp 73–78
Varatharajan R, Manogaran G (2017) Wearable sensor devices for early detection of Alzheimer disease using dynamic time warping algorithm. Clust Comput. https://doi.org/10.1007/s10586-017-0977-2
Article Google Scholar
Varatharajan R, Manogaran G, Priyan MK, Balas V, Barna C (2017) Visual analysis of geospatial habitat suitability model based on inverse distance weighting with paired comparison analysis. Multimedia Tools and Applications:1–21. https://doi.org/10.1007/s11042-017-4768-9
Article Google Scholar
Varatharajan R, Vasanth K, Gunasekaran M, Priyan M, Gao XZ (2017) An adaptive decision based kriging interpolation algorithm for the removal of high density salt and pepper noise in images. Comput Electr Eng. https://doi.org/10.1016/j.compeleceng.2017.05.035
Article Google Scholar
Vishnupriya R, Devi T (2014) Speech recognition tools for mobile phone - a comparative study, pp 426–430
Vlasenko B, Wendemuth A (2009) Heading toward to the natural way of human-machine interaction: the NIMITEK project. In: Multimedia and expo, 2009. ICME 2009. IEEE international conference on, pp 950–953
Wang H, Cai G, MacEachren AM (2008) GeoDialogue: a software agent enabling collaborative dialogues between a user and a conversational GIS, pp 357–360
Watson (computer) - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Watson_(computer). Accessed 01 May 2016
Williams JD (2008b) Integrating expert knowledge into POMDP optimization for spoken dialog systems. In: Proceedings of the AAAI workshop on advancements in POMDP solvers
Williams JD, Young S (2007) Scaling POMDPs for spoken dialog management. IEEE Trans Audio Speech Lang Process 15:2116–2129
Article Google Scholar
Young S (2017) Statistical spoken dialogue systems and the challenges for machine learning. In: Proceedings of the tenth ACM international conference on web search and data mining. ACM, p 577
Young SJ, Williams JD, Schatzmann J, Stuttle MN, Weilhammer K (2005) The hidden information state approach to dialogue management. Technical Report CUED/FINFENG/TR.544, Cambridge University Engineering Department
Young S, Gasic M, Keizer S, Mairesse F, Schatzmann J, Thomson B, Yu K (2009) The hidden information state model: a practical framework for POMDP-based spoken dialogue management. Comput Speech Lang 24:150–174. ISSN 08852308
Article Google Scholar
Zhang B, Cai Q, Mao J, Chang E, Guo B (2001) Spoken dialogue management as planning and acting under uncertainty. In: Seventh European conference on speech communication and technology

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Hindusthan Institute of Technology, Coimbatore, India
S. Lokesh
Department of Information Technology, Indra Ganesan College of Engineering, Trichirappalli, India
B. Kanisha
Department of Computer Science & Engineering, Velammal Institute of Technology, Chennai, India
S. Nalini
Department of Computer Science and Engineering, Hindusthan College of Engineering and Technology, Coimbatore, India
M. Ramya Devi
Department of Information Technology, Sri Ramakrishna Institute of Technology, Coimbatore, India
R. Kumar

Authors

S. Lokesh
View author publications
You can also search for this author in PubMed Google Scholar
B. Kanisha
View author publications
You can also search for this author in PubMed Google Scholar
S. Nalini
View author publications
You can also search for this author in PubMed Google Scholar
M. Ramya Devi
View author publications
You can also search for this author in PubMed Google Scholar
R. Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Lokesh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lokesh, S., Kanisha, B., Nalini, S. et al. Speech to speech interaction system using Multimedia Tools and Partially Observable Markov Decision Process for visually impaired students. Multimed Tools Appl 79, 5023–5042 (2020). https://doi.org/10.1007/s11042-018-6264-2

Download citation

Received: 08 April 2018
Revised: 18 May 2018
Accepted: 11 June 2018
Published: 23 June 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11042-018-6264-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech to speech interaction system using Multimedia Tools and Partially Observable Markov Decision Process for visually impaired students

Abstract

Access this article

Similar content being viewed by others

Early dementia detection with speech analysis and machine learning techniques

A comprehensive survey on automatic speech recognition using neural networks

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech to speech interaction system using Multimedia Tools and Partially Observable Markov Decision Process for visually impaired students

Abstract

Access this article

Similar content being viewed by others

Early dementia detection with speech analysis and machine learning techniques

A comprehensive survey on automatic speech recognition using neural networks

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation