Abstract
Effective human computer interaction requires speech recognition and voice response.In this paper we present a concatenative Speech-Text-Speech(STS) system and discuss the issues relevant to the development of perfect human-computer interaction.The new STS system allows the visually impaired people to interact with the computer by giving and getting voice commands.Audio samples are collected from the individuals and then transcribed to text.A text file is used ,where the meanings for the transcribed texts are stored.In the synthesis phase,the sentences taken from the text file are converted to speech using unit selection synthesis.The proposed method leads to a perfect human-computer interaction
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jones, D., Wolf, F., Gibson, E., Williams, E., Fedorenko, E., Reynolds, D., Zissman, M.: Measuring the readability of automatic speech-to-text transcripts. In: Proc. of Eurospeech, pp. 1585–1588 (2003)
Heeman, P., Allen, J.: Speech repairs, intonational phrases and discourse markers: Modeling speakers’ utterances in spoken dialogue. Computational Linguistics 25, 527–571 (1999)
Kim, J., Woodland, P.C.: The use of prosody in a combined system for punctuation generation and speech recognition. In: Proc. of Eurospeech, pp. 2757–2760 (2001); peech transcripts, In: Proc. of ISCA Workshop: Automatic speech Recognition: Challenges for the Millennium ASR-2000, pp. 228-235 (2000)
Gotoh, Y., Renals, S.: Sentence boundary detection in broadcast
Kompe, R.: Prosody in Speech Understanding System. Springer, Heidelberg (1996)
Snover, M., Dorr, B., Schwartz, R.: A lexically-driven algorithm for disfl uency
Kim, J.: Automatic detection of sentence boundaries, disfluencies, and conversational fillers in spontaneous speech. Master’s thesis, University of Washington (2004)
Johnson, M., Charniak, E.: A TAG- based noisy channel model of speech repairs. In: Proc. of ACL (2004)
Meysam, M., Fardad, F.: An advanced method for speech recognition. World Academy of Science,Engineering and Technology (2009)
Kirschning. Continuous Speech Recognition Using the Time-Sliced Paradigm. MEng.Dissertation, University Of Tokushinia (1998)
Tebelskis, J.: Speech Recognition Using Neural Networks, PhD. Dissertation, School Of ComputerScience, Carnegie Mellon University (1995)
Tchorz, J., Kollmeier, B.: A Psychoacoustical Model of the Auditory Periphery as Front-endforASR. In: ASAEAAiDEGA Joint Meeting on Acoustics, Berlin (March 1999)
Clark, C.L.: Labview Digital Signal Processing and Digital Communications. McGraw- Hill Companies, New York (2005)
Kehtarnavaz, N., Kim, N.: Digital Signal Processing System-Level Design Using Lab View. University of Texas, Dallas (2005)
Kantardzic, M.: Data Mining Concepts, Models, Methods, and Algorithms. IEEE, Piscataway (2003)
Lippmann, R.P.: An Introduction to Computing with neural nets. IEEE ASSP Mag. 4 (1997)
Martin, H.B.D., Hagan, T., Beale, M.: Neural Network Design. PWS Publishing Company, Boston (1996)
Dietterich, T.G.: Machine learning for sequential data: A review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 15–30. Springer, Heidelberg (2002)
MathWorks. Neural Network Toolbox User’s Guide (2004)
Kishore, S.P., Black, A.W., Kumar, R., Sangal, R.: Experiments with unit selection Speech Databases for Indian Languages
Sen, A.: Speech Synthesis in India. IETE Technical Review 24, 343–350 (2007)
Kishore, S.P., Black, A.W.: Unit size in Unit selection Speech Synthesis. In: Proceedings of Eurospeech, Geneva Switzerland (2003)
Kishore, S.P., Kumar, R., Sangal, R.: A data – driven synthesis approach for Indian Languages using syllable as basic unit. In: Proceedings of International Conference on National Language Processing, ICON (2002)
Kawachale, S.P., Chitode, J.S.: An Optimized Soft Cutting Approach to Derive Syllables from Words in Text to Speech Synthesizer. In: Proceedings Signal and Image Processing, p. 534 (2006)
Segi, H., Takagi, T., Ito, T.: A Concatenative Speech Synthesis Method using Context Dependent Phoneme Sequences with variable length as a Search Units. In: Fifth ISCA Speech Synthesis Workshop, Pittsburgh
Lewis, E., Tatham, M.: Word and Syllable Concatenation in Text to Speech Synthesis
Gros, J.Z., Zganec, M.: An Efficient Unit-selection Method for Concatenative Text-to-speech Synthesis
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kanisha, J., Balakrishanan, G. (2011). Speech Transaction for Blinds Using Speech-Text-Speech Conversions. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds) Advances in Computer Science and Information Technology. CCSIT 2011. Communications in Computer and Information Science, vol 131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17857-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-17857-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17856-6
Online ISBN: 978-3-642-17857-3
eBook Packages: Computer ScienceComputer Science (R0)