Abstract
In general, visually impaired students need of another person’s to teach them with the help of computers and book. However, a number of students are not aware of using the computers and understanding the concepts by self. In order to solve this issue, a speech to speech interaction system is developed on the basis of a novel dialogue management system. This interaction is developed by combining Multimedia tools and Partially Observable Markov Decision Process (POMDP) with agenda based model used in the proposed dialogue management system to learn the speech signals from user and system will reply accordingly. The proposed system helps visually impaired students to learn easily using a novel dialogue management system. Word Error Rate, Recognition cum retrieval rate and Misrecognition Retrieval Rate are calculated for the proposed POMDP with Agenda Based dialogue management system. The experimental results are compared with Finite-State Based dialogue management system, Frame Based dialogue management system, and Probabilistic dialogue management system. The experimental results proved that the good performance of the proposed POMDP with Agenda Based dialogue management system. The proposed model is trained with 125 speakers out of which 46 were visually impaired and tested with 95 untrained speakers out of which 32 are visually impaired.
Similar content being viewed by others
References
Aida-zade K, Rustamov S, Mustafayev E, Aliyeva N (2012) Humancomputer dialogue understanding hybrid system. Presented at the Innovations in Intelligent Systems and Applications (INISTA), 2012 International Symposium on, Trabzon, pp 1–5
Alexandersson J, Aretoulaki M, Campbell N, Gardner M, Girenko A, Klakow D, Koryzis D, Petukhova V, Specht M, Spiliotopoulos D, Stricker A, Taatgen N (2014) Metalogue: a multiperspective multimodal dialogue system with metacognitive abilities for highly adaptive and flexible dialogue management, pp 365–368
Banchs RE, Li H (2012) IRIS: a chat-oriented dialogue system based on the vector space model. In: Proceedings of the ACL 2012 system demonstrations, pp 37–42
Baumann T, Kennington C, Hough J, Schlangen D (2017) Recognising conversational speech: what an incremental asr should do for a dialogue system and how to get there. In: Dialogues with social robots. Springer, Singapore, pp 421–432
Bokaei MH, Sameti H, Eghbal-zadeh H, BabaAli B, Hosseinzadeh KH, Bahrani M, Veisi H, Sanian A (2010) Niusha, the first Persian speech-enabled IVR platform. In: Telecommunications (IST), 2010 5th international symposium on, pp 591–595
Budkov VY, Prischepa MV, Ronzhin AL, Karpov AA (2010) Multimodal human-robot interaction. In: Ultra modern telecommunications and control systems and workshops (ICUMT), 2010 international congress on, pp 485–488
Bui T, Poel M, Nijholt A, Zwiers J (2009) A tractable hybrid DDN-POMDP approach to affective dialogue modeling for probabilistic frame-based dialogue systems. Nat Lang Eng 15(2):273–307
Cavazza M, De La Cámara RS, Turunen M, Gil JR, Hakulinen J, Crook N, Field D (2010) ‘How was your day?’: an affective companion ECA prototype. In: Proceedings of the 11th annual meeting of the special interest group on discourse and dialogue, pp 277–280
Celikyilmaz A, Hakkani-Tur D, Tur G (2012) Statistical semantic interpretation modeling for spoken language understanding with enriched semantic features. In: Spoken language technology workshop (SLT), 2012 IEEE, pp 216–221
Cortana (software) - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Cortana_(software). Accessed 30 Apr 2016
Di Lecce V, Calabrese M, Soldo D, Quarto A Dialogueoriented interface for linguistic human-computer interaction: a chatbased application. Presented at the 2010 IEEE international conference on virtual environments, human-computer interfaces and measurement systems, taranto, pp. 103–108
Dinarelli M, Stepanov EA, Varges S, Riccardi G (2010) The LUNA spoken dialogue system: beyond utterance classification. In: ICASSP, pp 5366–5369
Doshi F, Roy N (2007) Efficient model learning for dialog management. In: Proceedings of the ACM/IEEE international conference on human-robot interaction. ACM, pp 65–72. ISBN 978-1- 59593-617-2
Dzikovska MO, Moore JD, Steinhauser N, Campbell G, Farrow E, Callaway CB (2010) Beetle II: a system for tutoring and computational linguistics experimentation. In: Proceedings of the ACL 2010 system demonstrations, pp 13–18
Dzikovska MO, Isard A, Bell P, Moore JD, Steinhauser N, Campbell G (2011) BEETLE II: an adaptable tutorial dialogue system. In: Proceedings of the SIGDIAL 2011 conference, pp 338–340
Ferrucci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur AA, Lally A, Murdock JW, Nyberg E, Prager J, others (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79
Galescu L, Allen J, Ferguson G, Quinn J, Swift M (2009) Speech recognition in a dialog system for patient health monitoring
Galibert O, Illouz G, Rosset S (2005) Ritel: an open-domain, humancomputer dialog system. In: Interspeech, pp 909–912
Google Now - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Google_Now. Accessed 30 Apr 2016
Hastie H, Aufaure M-A, Alexopoulos P, Cuayáhuitl H, Dethlefs N, Gasic M, Henderson J, Lemon O, Liu X, Mika P, others (2013) Demonstration of the parlance system: a data-driven, incremental, spoken dialogue system for interactive search. In: Proceedings of the SIGDIAL 2013 conference, pp 154–156
Henderson J, Lemon O, Georgila K (2005) Hybrid reinforcement/supervised learning for dialogue policies from communicator data. In: IJCAI workshop on knowledge and reasoning in practical dialogue systems, pp 68–75
Hsieh M-C, Hung W-S, Lin S-W, Luo C-H (2009) Designing an assistive dialog agent for a case of spinal cord injury, pp 67–72
Hung V, Gonzalez A, DeMara R (2009) Towards a context-based dialog management layer for expert systems, pp 60–65
Jokinen K, Wilcock G (2011) Emergent verbal behaviour in humanrobot interaction. InL Cognitive Infocommunications (CogInfoCom), 2011 2nd international conference on, pp 1–4
Kanisha B, Lokesh S, Kumar PM et al (2018) Speech recognition with improved support vector machine using dual classifiers and cross fitness validation. Pers Ubiquit Comput. https://doi.org/10.1007/s00779-018-1139-0
Karpov A, Ronzhin A, Kipyatkova I, Ronzhin A, Akarun L (2010) Multimodal human computer interaction with MIDAS intelligent infokiosk, pp 3862–3865
Kim D, Sim HS, Kim KE, Kim JH, Kim H, Sung JW (2008) Effects of user modeling on POMDP based dialogue systems. In: Proceedings of interspeech
Lee C, Cha Y-S, Kuc T-Y (2008) Implementation of dialogue system for intelligent service robots. In: Control, automation and systems, 2008. ICCAS 2008. International conference on, pp 2038–2042
Lefevre F, Gasic M, Jurcicek F, Keizer S, Mairesse F, Thomson B, Yu K, Young S (2009) k-nearest neighbor Monte-Carlo control algorithm for POMDP-based dialogue systems. In: Proceedings of SIGDIAL
Lemaignan S, Ros R, Alami R, Beetz M (2011) What are you talking about? Grounding dialogue in a perspective-aware robotic architecture. In: RO-MAN, 2011 IEEE, pp 107–112
Li L, Williams JD, Balakrishnan S (2009) Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection In: Proceedings of interspeech
Liu J, Cyphers S, Pasupat P, McGraw I, Glass JR (2012) A conversational movie search system based on conditional random fields. In: INTERSPEECH, pp 2454–2457
Lokesh S, Balakrishnan G (2012) Speech enhancement using mel-LPC cepstrum and vector quantization for ASR. Eur J Sci Res 73(2):202–209
Lokesh S, Balakrishnan G (2012) Robust speech feature prediction using Mel-LPC to improve recognition accuracy. Inf Technol J 11(11):1644–1699
Lokesh S, Devi MR (2017) Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method. Clust Comput. https://doi.org/10.1007/s10586-017-1447-6. Springer
Lokesh S, Malarvizhi Kumar P, Ramya Devi M et al (2018) An automatic Tamil speech recognition system by using bidirectional recurrent neural network with self-organizing map. Neural Computing & Applications. https://doi.org/10.1007/s00521-018-3466-5
Mantena GV, Rajendran S, Rambabu B, Gangashetty SV, Yegnanarayana B, Prahallad K (2011) A speech-based conversation system for accessing agriculture commodity prices in Indian languages. In: Hands-free speech communication and microphone arrays (HSCMA), 2011 joint workshop on, pp 153–154
Mantena GV, Rajendran S, Gangashetty SV, Prahallad K (2011) Development of a spoken dialogue system for accessing agricultural information in Telugu. In: Proceedings of ICON-2011, 9th international conference on natural language processing
Morbini F, Forbell E, DeVault D, Sagae K, Traum DR, Rizzo AA (2012) A mixed-initiative conversational dialogue system for healthcare. In: Proceedings of the 13th annual meeting of the special interest group on discourse and dialogue, pp 137–139
Peters J, Vijayakumar S, Schaal S (2005) Natural actor-critic. In: Proceedings of ECML. Springer, Heidelberg, pp 280–291
Roy N, Pineau J, Thrun S (2000) Spoken dialogue management using probabilistic reasoning. In: Proceedings of ACL
Schwarzler S, Schenk J, Ruske G, Wallhoff F (2009) A multi-agent framework for a hybrid dialog management system. Presented at the IEEE international conference on multimedia and expo, New York, NY, pp 958–961
Selvaraj L, Ganesan B (2014) Enhancing speech recognition using improved particle swarm optimization based hidden Markov model. Sci World J. https://doi.org/10.1155/2014/270576
Shahnawazuddin S, Thotappa D, Sarma BD, Deka A, Prasanna SRM, Sinha R (2013) Assamese spoken query system to access the price of agricultural commodities. In: Communications (NCC), 2013 National Conference on, pp 1–5
Sharma K, Haksar P (2012) Speech denoising using different types of filters. International Journal of Engineering Research and Applications 2(1):809–811
Siri - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Siri. Accessed 30 Apr 2016
Thomson B, Schatzmann J, Young S (2008) Bayesian update of dialogue state for robust dialogue systems. In: Proceedings of ICASSP, pp 4937–4940
Ultes S, Barahona LMR, Su PH, Vandyke D, Kim D, Casanueva I, … Young S (2017) Pydial: a multi-domain statistical dialogue system toolkit. Proceedings of ACL 2017, system demonstrations, pp 73–78
Varatharajan R, Manogaran G (2017) Wearable sensor devices for early detection of Alzheimer disease using dynamic time warping algorithm. Clust Comput. https://doi.org/10.1007/s10586-017-0977-2
Varatharajan R, Manogaran G, Priyan MK, Balas V, Barna C (2017) Visual analysis of geospatial habitat suitability model based on inverse distance weighting with paired comparison analysis. Multimedia Tools and Applications:1–21. https://doi.org/10.1007/s11042-017-4768-9
Varatharajan R, Vasanth K, Gunasekaran M, Priyan M, Gao XZ (2017) An adaptive decision based kriging interpolation algorithm for the removal of high density salt and pepper noise in images. Comput Electr Eng. https://doi.org/10.1016/j.compeleceng.2017.05.035
Vishnupriya R, Devi T (2014) Speech recognition tools for mobile phone - a comparative study, pp 426–430
Vlasenko B, Wendemuth A (2009) Heading toward to the natural way of human-machine interaction: the NIMITEK project. In: Multimedia and expo, 2009. ICME 2009. IEEE international conference on, pp 950–953
Wang H, Cai G, MacEachren AM (2008) GeoDialogue: a software agent enabling collaborative dialogues between a user and a conversational GIS, pp 357–360
Watson (computer) - Wikipedia, the free encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Watson_(computer). Accessed 01 May 2016
Williams JD (2008b) Integrating expert knowledge into POMDP optimization for spoken dialog systems. In: Proceedings of the AAAI workshop on advancements in POMDP solvers
Williams JD, Young S (2007) Scaling POMDPs for spoken dialog management. IEEE Trans Audio Speech Lang Process 15:2116–2129
Young S (2017) Statistical spoken dialogue systems and the challenges for machine learning. In: Proceedings of the tenth ACM international conference on web search and data mining. ACM, p 577
Young SJ, Williams JD, Schatzmann J, Stuttle MN, Weilhammer K (2005) The hidden information state approach to dialogue management. Technical Report CUED/FINFENG/TR.544, Cambridge University Engineering Department
Young S, Gasic M, Keizer S, Mairesse F, Schatzmann J, Thomson B, Yu K (2009) The hidden information state model: a practical framework for POMDP-based spoken dialogue management. Comput Speech Lang 24:150–174. ISSN 08852308
Zhang B, Cai Q, Mao J, Chang E, Guo B (2001) Spoken dialogue management as planning and acting under uncertainty. In: Seventh European conference on speech communication and technology
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lokesh, S., Kanisha, B., Nalini, S. et al. Speech to speech interaction system using Multimedia Tools and Partially Observable Markov Decision Process for visually impaired students. Multimed Tools Appl 79, 5023–5042 (2020). https://doi.org/10.1007/s11042-018-6264-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6264-2