Analysis of Durations of Sound Units

Rao, K. Sreenivasa

doi:10.1007/978-1-4614-1338-7_3

K. Sreenivasa Rao²

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

743 Accesses

Abstract

This chapter presents the detailed analysis of durations of sound units. Durations of the syllables are analyzed with respect to positional and contextual factors. For detailed duration analysis, syllables are categorized into groups based on size of the word and position of the word in the utterance, and the analysis is performed separately on each category. From the duration analysis presented in this chapter, it is observed that durations of sound units depend on several factors at various levels, and it is very difficult to derive the precise rules for accurate estimation of durations. Therefore, there is a need to explore nonlinear models to capture the duration patterns of sound units from features mentioned in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

S. Werner and E. Keller, “Prosodic aspects of speech,” in Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State of the Art, the Future Challenges (E. Keller, ed.), pp. 23–40, Chichester: John Wiley, 1994.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Modeling syllable duration in Indian languages using neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Montreal, Quebec, Canada), pp. 313–316, May 2004.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Intonation modeling for Indian languages,” in Proc. Int. Conf. Spoken Language Processing, (Jeju Island, Korea), pp. 733–736, Oct. 2004.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Prosodic manipulation using instants of significant excitation,” in Proc. IEEE Int. Conf. Multimedia and Expo, (Baltimore, Maryland, USA), pp. 389–392, July 2003.
Google Scholar
H. Mixdorff, An integrated approach to modeling German prosody. PhD thesis, Technical University, Dresden, Germany, July 2002.
Google Scholar
D. H. Klatt, “Synthesis by rule of segmental durations in English sentences,” in Frontiers of Speech Communication Research (B. Lindblom and S. Ohman, eds.), pp. 287–300, New York: Academic Press, 1979.
Google Scholar
J. Allen, M. S. Hunnicut, and D. H. Klatt, From Text to Speech: The MITalk system. Cambridge: Cambridge University Press, 1987.
Google Scholar
K. J. Kohler, “Zeistrukturierung in der Sprachsynthese,” ITG-Tagung Digitalc Sprachverarbeitung, no. 6, pp. 165–170, 1988.
Google Scholar
K. Bartkova and C. Sorin, “A model of segmental duration for speech synthesis in French,” Speech Communication, no. 6, pp. 245–260, 1987.
Google Scholar
S. R. R. Kumar and B. Yegnanarayana, “Significance of durational knowledge for speech synthesis in Indian languages,” in Proc. IEEE Region 10 Conf. Convergent Technologies for the Asia-Pacific, (Bombay, India), pp. 486–489, Nov. 1989.
Google Scholar
S. R. R. Kumar, “Significance of durational knowledge for a text-to-speech system in an Indian language,” Master’s thesis, Dept. of Computer science and Engineering, Indian Institute of Technology Madras, Mar. 1990.
Google Scholar
R. Sriram, S. R. R. Kumar, and B. Yegnanarayana, A Text-to-Speech conversion system for Indian languages using parameter based approach. Technical report no.12, Project VOIS, Dept. of CSE, IITM, May 1989.
Google Scholar
K. K. Kumar, “Duration and intonation knowledge for text-to-speech conversion system for Telugu and Hindi,” Master’s thesis, Dept. of Computer science and Engineering, Indian Institute of Technology Madras, May 2002.
Google Scholar
K. K. Kumar, K. S. Rao, and B. Yegnanarayana, “Duration knowledge for text-to-speech system for Telugu,” in Proc. Int. Conf. Knowledge Based Computer Systems, (Mumbai, India), pp. 563–571, Dec. 2002.
Google Scholar
S. H. Chen, W. H. Lai, and Y. R. Wang, “A new duration modeling approach for Mandarin speech,” IEEE Trans. Speech and Audio Processing, vol. 11, pp. 308–320, July 2003.
Article Google Scholar
J. P. H. V. Santen, “Segmental duration and speech timing,” in Computing prosody (Sagisaka, Campbell, and Higuchi, eds.), pp. 225–249, Springer-Verlag, 1996.
Google Scholar
J. P. H. V. Santen, “Assignment of segment duration in text-to-speech synthesis,” Computer Speech and Language, vol. 8, pp. 95–128, Apr. 1994.
Article Google Scholar
J. P. H. V. Santen, “Timing in text-to-speech systems,” in Proc. Eurospeech, vol. 35, (Berlin, Germany), pp. 1397–1404, 1993.
Google Scholar
J. P. H. V. Santen, “Analyzing n-way tables with sums-of-products models,” Journal of Mathematical Psychology, vol. 37, pp. 327–371, 1993.
Article MathSciNet MATH Google Scholar
J. P. H. V. Santen, “Prosodic modeling in text-to-speech synthesis,” in Proc. Eurospeech, (Rhodes, Greece), 1997.
Google Scholar
J. P. H. V. Santen, C. Shih, and et. al., “Multi-lingual duration modeling,” in Proc. Eurospeech, vol. 5, (Rhodes, Greece), pp. 2651–2654, 1997.
Google Scholar
O. Goubanova and P. Taylor, “Using bayesian belief networks for model duration in text-to-speech systems,” in Proc. Int. Conf. Spoken Language Processing, vol. 2, (Beijing, China), pp. 427–431, Oct. 2000.
Google Scholar
O. Sayli, “Duration analysis and modeling for Turkish text-to-speech synthesis,” Master’s thesis, Dept. of Electrical and Electronics Engineering, Bogaziei University, 2002.
Google Scholar
A. W. Black, P. Taylor, and R. Caley, “The Festival speech synthesis system.” Manual and source code available at www.cstr.ed.ac.uk/projects/festival.html .
M. Riley, “Tree-based modeling of segmental durations,” Talking Machines: Theories, Models and Designs, pp. 265–273, 1992.
Google Scholar
S. Lee and Y.-H. Oh, “Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems,” Speech Communication, vol. 28, pp. 283–300, 1999.
Article Google Scholar
A. Maghboulegh, “An empirical comparison of automatic decision tree and hand-configured linear models for vowel durations,” in Proc. of the Workshop in Computational Phonology in Speech Technology, (Santa Cruz), 1996.
Google Scholar
R. Batusek, “A duration model for Czech text-to-speech synthesis,” in Proceedings of TSD, (Pilsen, Czech Republic), Sept. 2002.
Google Scholar
H. Chung, “Segment duration in spoken Korean,” in Proc. Int. Conf. Spoken Language Processing, (Denver, Colorado, USA), pp. 1105–1108, Sept. 2002.
Google Scholar
N. S. Krishna and H. A. Murthy, “Duration modeling of Indian languages Hindi and Telugu,” in 5th ISCA Speech Synthesis Workshop, (Pittsburgh, USA), pp. 197–202, May 2004.
Google Scholar
W. N. Campbell, “Analog i/o nets for syllable timing,” Speech Communication, vol. 9, pp. 57–61, Feb. 1990.
Article Google Scholar
W. N. Campbell, “Syllable based segment duration,” in Talking Machines: Theories, Models and Designs (G. Bailly, C. Benoit, and T. R. Sawallis, eds.), pp. 211–224, Elsevier, 1992.
Google Scholar
P. A. Barbosa and G. Bailly, “Characterization of rhythmic patterns for text-to-speech synthesis,” Speech Communication, vol. 15, pp. 127–137, 1994.
Article Google Scholar
P. A. Barbosa and G. Bailly, “Generation of pauses within the z-score model,” in Progress in Speech Synthesis, pp. 365–381, Springer-Verlag, 1997.
Google Scholar
R. Cordoba, J. A. Vallejo, J. M. Montero, J. Gutierrezarriola, M. A. Lopez, and J. M. Pardo, “Automatic modeling of duration in a Spanish text-to-speech system using neural networks,” in Proc. European Conf. Speech Communication and Technology, (Budapest, Hungary), Sept. 1999.
Google Scholar
Y. Hifny and M. Rashwan, “Duration modeling of Arabic text-to-speech synthesis,” in Proc. Int. Conf. Spoken Language Processing, (Denver, Colorado, USA), pp. 1773–1776, Sept. 2002.
Google Scholar
G. P. Sonntag, T. Portele, and B. Heuft, “Prosody generation with a neural network: Weighing the importance of input parameters,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Munich, Germany), pp. 931–934, Apr. 1997.
Google Scholar
J. P. Teixeira and D. Freitas, “Segmental durations predicted with a neural network,” in Proc. European Conf. Speech Communication and Technology, (Geneva, Switzerland), pp. 169–172, Sept. 2003.
Google Scholar
R. E. Donovan, Trainable speech synthesis. PhD thesis, Cambridge University Engineering Department, Christ’s college, Trumpington Street, Cambridge CB2 1PZ, England, June 1996.
Google Scholar
A. Botinis, B. Granstrom, and B. Mobius, “Developments and paradigms in intonation research,” Speech Communication, vol. 33, pp. 263–296, 2001.
Article MATH Google Scholar
P. A. Taylor, “Analysis and synthesis of intonation using the Tilt model,” Journal of Acoustic Society of America, vol. 107, pp. 1697–1714, Mar. 2000.
Article Google Scholar
P. A. Taylor, “The rise/fall/connection model of intonation,” Speech Communication, vol. 15, no. 15, pp. 169–186, 1995.
Google Scholar
J. B. Pierrehumbert, The Phonology and Phonetics of English Intonation. PhD thesis, MIT, MA, USA, 1980.
Google Scholar
R. Sproat, ed., Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Kluwer Academic Publishers, 1998.
Google Scholar
M. Jilka, G. Mohler, and G. Dogil, “Rules for generation of TOBI-based American English intonation,” Speech Communication, vol. 28, pp. 83–108, 1999.
Article Google Scholar
J. Buhmann, H. Vereecken, J. Fackrell, J. P. Martens, and B. V. Coile, “Data driven intonation modeling of 6 languages,” in Proc. Int. Conf. Spoken Language Processing, vol. 3, (Beijing, China), pp. 179–183, Oct. 2000.
Google Scholar
N. (Thorsen) Gronnum, “The groundworks of Danish intonation: An introduction.” Museum Tusculanum Press, Copenhagen, 1992.
Google Scholar
N. (Thorsen) Gronnum, “Superposition and subordination in intonation - a non-linear approach,” in Proceedings of the 13 ^th International Congress - Phon. Sc. Stockholm, (Stockholm), pp. 124–131, 1995.
Google Scholar
E. Garding, “A generative model of intonation,” in Prosody: Models and Measuraments (A. Cutler and D. R. Ladd, eds.), pp. 11–25, Berlin, Germany: Springer-Verlag, 1983.
Google Scholar
H. Fujisaki, K. Hirose, P. Halle, and H. Lei, “A generative model for the prosody of connected speech in Japanese,” in Ann. Rep. Engineerng Research Institute 30, pp. 75–80, 1971.
Google Scholar
H. Fujisaki, “Dynamic characteristics of voice fundamental frequency in speech and singing,” in The production of speech (P. F. MacNeilage, ed.), pp. 39–55, New York, USA: Springer-Verlag, 1983.
Google Scholar
H. Fujisaki, “A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour,” in Vocal Physiology: Voice Production, Mechanisms and Functions (O. Fujimura, ed.), pp. 347–355, New York, USA: Raven Press, 1988.
Google Scholar
H. Fujisaki, K. Hirose, and N. Takahashi, “Acoustic characteristics and the underlying rules of the intonation of the common Japanese used by radio and TV anouncers,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 2039–2042, 1986.
Google Scholar
H. Fujisaki, S. Ohno, K. Nakamura, M. Guirao, and J. Gurlekian, “Analysis and synthesis of accent and intonation in standard Spanish,” in Proc. Int. Conf. Spoken Language Processing, (Yokohama), pp. 355–358, 1994.
Google Scholar
H. Fujisaki and S. Ohno, “Analysis and modeling of fundamental frequency contours of English utterances,” in Proceedings Eurospeech 95, (Madrid), pp. 985–988, 1995.
Google Scholar
H. Fujisaki, S. Ohno, and T. Yagi, “Analysis and modeling of fundamental frequency contours of Greek utterances,” in Proceedings Eurospeech 97, (Rhodes, Greece), pp. 465–468, Sept. 1997.
Google Scholar
H. Mixdorff and H. Fujisaki, “Analysis of voice fundamental frequency contours of German utterances using a quantitative model,” in Proc. Int. Conf. Spoken Language Processing, vol. 4, (Yokohama), pp. 2231–2234, 1994.
Google Scholar
P. Taylor and S. Isard, “A new model of intonation for use with speech synthesis and recognition,” in Proc. Int. Conf. Spoken Language Processing, pp. 1287–1290, 1992.
Google Scholar
J. t’Hart, R. Collier, and A. Cohen, A perceptual study of intonation. Cambridge: Cambridge University Press.
Google Scholar
C. D’Alessandro and P. Mertens, “Automatic pitch contour stylisation using a model of tonal perception,” Computer Speech and Language, vol. 9, pp. 257–288, 1995.
Article Google Scholar
P. Mertens, L’intonation du Franais: de la description linguistique a’ la reconnaissance automatique. PhD thesis, Katholieke Universiteit Leuven, Leuven, 1987.
Google Scholar
J. Terken, “Synthesizing natural sounding intonation for Dutch: rules and perceptual evaluation,” Computer Speech and Language, vol. 7, pp. 27–48, 1993.
Article Google Scholar
J. R. de Pijper, “Modeling British English Intonation,” 1983. Foris, Dordrecht.
Google Scholar
L. M. H. Adriaens, Ein Modell Deutscher Intonation. PhD thesis, Technical University Eindhoven, Eindhoven, 1991.
Google Scholar
C. Ode, “Russian intonation: A perceptual description,” 1989. Rodopi, Amsterdam.
Google Scholar
M. S. Scordilis and J. N. Gowdy, “Neural network based generation of fundamental frequency contours,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 1, (Glasgow, Scotland), pp. 219–222, May. 1989.
Google Scholar
M. Vainio and T. Altosaar, “Modeling the microprosody of pitch and loudness for speech synthesis with neural networks,” in Proc. Int. Conf. Spoken Language Processing, (Sidney, Australia), Dec. 1998.
Google Scholar
M. Vainio, Artificial neural network based prosody models for Finnish text-to-speech synthesis. PhD thesis, Dept. of Phonetics, University of Helsinki, Finland, 2001.
Google Scholar
S. H. Hwang and S. H. Chen, “Neural-network-based F0 text-to-speech synthesizer for Mandarin,” IEE Proc. Image Signal Processing, vol. 141, pp. 384–390, Dec. 1994.
Article Google Scholar
A. S. M. Kumar, S. Rajendran, and B. Yegnanarayana, “Intonation component of text-to-speech system for Hindi,” Computer Speech and Language, vol. 7, pp. 283–301, 1993.
Article Google Scholar
A. S. M. Kumar, Intonation knowledge for speech systems for an Indian language. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology, Madras, Chennai, India, Jan. 1993.
Google Scholar
T. F. Quatieri and R. J. McAulay, “Shape invariant time-scale and pitch modification of speech,” IEEE Trans. Signal Processing, vol. 40, pp. 497–510, Mar. 1992.
Article Google Scholar
E. Moulines and F. Charpentier, “Pitch-synchronous waveform processing techniques for text to speech synthesis using diphones,” Speech Communication, vol. 9, pp. 453–467, Dec. 1990.
Article Google Scholar
D. G. Childers, K. Wu, D. M. Hicks, and B. Yegnanarayana, “Voice conversion,” Speech Communication, vol. 8, pp. 147–158, June 1989.
Article Google Scholar
E. Moulines and J. Laroche, “Non-parametric techniques for pitch-scale and time-scale modification of speech,” Speech Communication, vol. 16, pp. 175–205, Feb. 1995.
Article Google Scholar
B. Yegnanarayana, S. Rajendran, V. R. Ramachandran, and A. S. M. Kumar, “Significance of knowledge sources for TTS system for Indian languages,” SADHANA Academy Proc. in Engineering Sciences, vol. 19, pp. 147–169, Feb. 1994.
Google Scholar
M. R. Portnoff, “Time-scale modification of speech based on short-time Fourier analysis,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 29, pp. 374–390, June. 1981.
Google Scholar
M. R. Schroeder, J. L. Flanagan, and E. A. Lundry, “Bandwidth compression of speech by analytic-signal rooting,” Proc. IEEE, vol. 55, pp. 396–401, Mar. 1967.
Article Google Scholar
D. H. Klatt, “Review of text-to-speech conversion for English,” Journal of Acoustic Society of America, vol. 82(3), pp. 737–793, Sep. 1987.
Article Google Scholar
M. Narendranadh, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, “Transformation of formants for voice conversion using artificial neural networks,” Speech Communication, vol. 16, pp. 206–216, Feb. 1995.
Google Scholar
E. P. Neuburg, “Simple pitch-dependent algorithm for high-quality speech rate changing,” Journal of Acoustic Society of America, vol. 63, pp. 624–625, Feb. 1978.
Article Google Scholar
E. B. George and M. J. T. Smith, “Speech Analysis/Synthesis and modification using an Analysis-by-Synthesis/Overlap-Add Sinusoidal model,” IEEE Trans. Speech and Audio Processing, vol. 5, pp. 389–406, Sept. 1997.
Google Scholar
R. Crochiere, “A weighted overlap-add method of short time Fourier analysis/synthesis,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 28, pp. 99–102, Feb. 1980.
Google Scholar
S. Roucos and A. Wilgus, “High quality time-scale modification of speech,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Tampa, Florida, USA), pp. 493–496, Mar. 1985.
Google Scholar
J. Laroche, Y. Stylianou, and E. Moulines, “HNS: Speech modification based on a harmonic + noise model,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Minneapolis, USA), pp. 550–553, Apr. 1993.
Google Scholar
Y. Stylianou, “Applying the harmonic plus noise model in concatenative speech synthesis,” IEEE Trans. Speech and Audio Processing, vol. 9, pp. 21–29, Jan. 2001.
Article Google Scholar
H. Kawahara, “Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 2, (Munich, Germany), pp. 1303–1306, 1997.
Google Scholar
H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigne, “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds,” Speech Communication, vol. 27, pp. 187–207, 1999.
Article Google Scholar
R. MuraliSankar, A. G. Ramakrishnan, and P. Prathibha, “Modification of pitch using DCT in source domain,” Speech Communication, vol. 42, pp. 143–154, Jan. 2004.
Article Google Scholar
R. MuraliSankar, A. G. Ramakrishnan, A. K. Rohitprasad, and M. Anoop, “DCT baced pitch modification,” in Proc. SPCOM 2001 6th Biennial Conference, (Bangalore, India), pp. 114–117, July 2001.
Google Scholar
W. Verhelst, “Overlap-add methods for time-scaling of speech,” Speech Communication, vol. 30, pp. 207–221, 2000.
Article Google Scholar
D. O’Brien and A. Monaghan, “Shape invariant time-scale modification of speech using a harmonic model,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Arizona, USA), 1999.
Google Scholar
D. O’Brien and A. Monaghan, “Shape invariant pitch modification of speech using a harmonic model,” in Proc. Eurospeech, (Budapest), 1999.
Google Scholar
D. O’Brien and A. Monaghan, Improvements in Speech Synthesis, ch. Shape invariant pitch and time-scale modification of speech based on harmonic model. Chichester: John Wiley & Sons, 2001.
Google Scholar
B. Yegnanarayana, C. d’Alessandro, and V. Darsinos, “An iterative algorithm for decomposition of speech signals into periodic and aperiodic components,” IEEE Trans. Speech and Audio Processing, vol. 6, pp. 1–11, Jan. 1998.
Article Google Scholar
S. Lemmetty, “Review of speech synthesis technology,” Master’s thesis, Dept. of Electrical and Communications Engineering, Helsinki University of Technology, Espoo, Finland, Mar. 1999.
Google Scholar
R. Kortekaas and A. Kohlrausch, “Psychoacoustical evaluation of the Pitch Synchronous Overlap-and-Add speech waveform manipulation technique using single formant stimuli,” Journal of Acoustic Society of America, vol. 101, no. 4, pp. 2202–2213, 1997.
Article Google Scholar
H. Valbret, E. Moulines, and J. P. Tubach, “Voice transformation using PSOLA techniques,” Speech Communication, vol. 11, pp. 175–187, 1992.
Article Google Scholar
Y. Jiang and P. Murphy, “Production based pitch modification of voiced speech,” in Proc. Int. Conf. Spoken Language Processing, (Denver, Colorado, USA), pp. 2073–2076, Sept. 2002.
Google Scholar
S. Haykin, Neural Networks: A Comprehensive Foundation. New Delhi, India: Pearson Education Aisa, Inc., 1999.
MATH Google Scholar
B. Yegnanarayana, Artificial Neural Networks. New Delhi, India: Printice-Hall, 1999.
Google Scholar
V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 2001.
Google Scholar
A. Smola and B. Scholkopf, A Tutorial on Support Vector Regression. Technical report Neuro COLT NC-TR-98-030, 1998.
Google Scholar
X. Huang, A. Acero, and H. W. Hon, Spoken Language Processing. New York, NJ, USA: Prentice-Hall, Inc., 2001.
Google Scholar
D. H. Klatt, “Linguistic uses of segmental duration in English: Acoustic and perceptual evidence,” Journal of Acoustic Society of America, vol. 59, pp. 1209–1221, 1976.
Article Google Scholar
A. W. F. Huggins, “On the perception of temporal phenomena in speech,” Journal of Acoustic Society of America, vol. 4, pp. 1279–1290, 1972.
Article Google Scholar
D. K. Oller, “The effect of position in utterance on speech segment duration in English,” Journal of Acoustic Society of America, vol. 54, pp. 1247–1253, 1973.
Article Google Scholar
T. H. Crystal and A. S. House, “Characterization and modeling of speech segment durations,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 2791–2794, 1986.
Google Scholar
T. H. Crystal and A. S. House, “The duration of American English vowels: an overview,” Journal of Phonetics, vol. 16, pp. 263–284, 1988.
Google Scholar
T. H. Crystal and A. S. House, “The duration of American English stop consonants: An overview,” Journal of Phonetics, vol. 16, pp. 285–294, 1988.
Google Scholar
K. N. Reddy, “The duration of Telugu speech sounds: an acoustic study,” Special issue of Journal of IETE on Speech processing, pp. 57–63, 1988.
Google Scholar
S. R. Savithri, “Duration analysis of Kannada vowels,” Journal of Acoustical Society of India, vol. 4, pp. 34–40, 1986.
Google Scholar
K. S. Rao, S. V. Gangashetty, and B. Yegnanarayana, “Duration analysis for Telugu language,” in Int. Conf. Natural Language Processing, (Mysore, India), pp. 152–158, Dec. 2003.
Google Scholar
N. Umeda, “Linguistic rules for text-to-speech synthesis,” Proc. IEEE, vol. 4, pp. 443–451, 1976.
Article Google Scholar
A. Chopde, “Itrans Indian language transliteration package version 5.2 source.” http://www.aczone.con/itrans/ .
A. N. Khan, S. V. Gangashetty, and B. Yegnanarayana, “Syllabic properties of three Indian languages: Implications for speech recognition and language identification,” in Int. Conf. Natural Language Processing, (Mysore, India), pp. 125–134, Dec. 2003.
Google Scholar
O. Fujimura, “Syllable as a unit of speech recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 23, pp. 82–87, Feb. 1975.
Google Scholar
K. S. Rao and S. G. Koolagudi, “Selection of suitable features for modeling the durations of syllables,” Journal of Software Engineering and Applications, vol. 3, Dec. 2010.
Google Scholar
M. Riedi, “A neural network based model of segmental duration for speech synthesis,” in Proc. European Conf. Speech Communication and Technology, (Madrid), pp. 599–602, Sept. 1995.
Google Scholar
W. N. Campbell, “Predicting segmental durations for accommodation within a syllable-level timing framework,” in Proc. European Conf. Speech Communication and Technology, vol. 2, (Berlin, Germany), pp. 1081–1084, Sept. 1993.
Google Scholar
S. Rajendran, K. S. Rao, B. Yegnanarayana, and K. N. Reddy, “Syllable duration in broadcast news in Telugu: A preliminary study,” in National Conf. on Language Technology Tools: Implementation of Telugu/Urdu, (Hyderabad, India), Oct. 2003.
Google Scholar
K. S. Rao, S. V. Gangashetty, and B. Yegnanarayana, “Duration analysis for Telugu language,” in Int. Conf. on Natural Language Processing (ICON), (Mysore, India), Dec. 2003.
Google Scholar
S. Lee, K. Hirose, and N. Minematsu, “Incoporation of prosodic modules for large vocabulary continuous speech recognition,” in Proc. ISCA Workshop on Prosody in Speech recognition and understanding, pp. 97–101, 2001.
Google Scholar
K. Ivano, T. Seki, and S. Furui, “Noise robust speech recognition using F0 contour extract by Hough transform,” in Proc. Int. Conf. Spoken Language Processing, pp. 941–944, 2002.
Google Scholar
L. Mary and B. Yegnanarayana, “Prosodic features for speaker verification,” in Proc. Int. Conf. Spoken Language Processing, (Pittsburgh, PA, USA), pp. 917–920, Sep. 2006.
Google Scholar
L. Mary, Multi level implicit features for language and speaker recognition. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, June 2006.
Google Scholar
L. Mary and B. Yegnanarayana, “Consonant-vowel based features for language identification,” in Int. Conf. Natural Language Processing, (Kanpur, India), pp. 103–106, Dec. 2005.
Google Scholar
L. Mary, K. S. Rao, and B. Yegnanarayana, “Neural network classifiers for language identification using phonotactic and prosodic features,” in Proc. Int. Conf. Intelligent Sensing and Information Processing (ICISIP), (Chennai, India), pp. 404–408, Jan. 2005.
Google Scholar
K. K. Kumar, K. S. Rao, and B. Yegnanarayana, “Duration knowledge for text-to-speech system for telugu,” in Int. Conf. Knowledge based computer systems (KBCS), (Mumbai, India), Dec. 2002.
Google Scholar
C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998.
Article Google Scholar
T. B. Trafalis and H. Lnce, “Support vector machine for regression and applications to financial forecasting,” in Int. Joint Conf. Neural Networks, pp. 348–353, June 2000.
Google Scholar
J. R. Bellegarda, K. E. A. Silverman, K. Lenzo, and V. Anderson, “Statistical prosodic modeling: From corpus design to parameter estimation,” IEEE Trans. Speech and Audio Processing, vol. 9, pp. 52–66, Jan. 2001.
Article Google Scholar
J. R. Bellegarda and K. E. A. Silverman, “Improved duration modeling of English phonemes using a root sinusoidal transformation,” in Proc. Int. Conf. Spoken Language Processing, pp. 21–24, Dec. 1998.
Google Scholar
K. E. A. Silverman and J. R. Bellegarda, “Using a sigmoid transformation for improved modeling of phoneme duration,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Phoenix, AZ, USA), pp. 385–388, Mar. 1999.
Google Scholar
B. Siebenhaar, B. Zellner-Keller, and E. Keller, “Phonetic and timing considerations in a Swiss high German TTS system,” in Improvements in Speech Synthesis (E. Keller, G. Bailly, A. Monaghan, J. Terken, and M. Huckvale, eds.), Chichester: John Wiley, 2001.
Google Scholar
C. S. Gupta, S. R. M. Prasanna, and B. Yegnanarayana, “Autoassociative neural network models for online speaker verification using source features from vowels,” in Int. Joint Conf. Neural Networks, (Honululu, Hawii, USA), May 2002.
Google Scholar
B. Yegnanarayana and S. P. Kishore, “AANN an alternative to GMM for pattern recognition,” Neural Networks, vol. 15, pp. 459–469, Apr. 2002.
Article Google Scholar
K. S. Rao and B. Yegnanarayana, “Modeling syllable duration in Indian languages using neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Montreal, Quebec, Canada), May 2004.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Modeling syllable duration in indian languages using support vector machines,” in 2nd Int. Conf. Intelligent Sensing and Information Processing (ICISIP-2005), (Chennai, India), Jan. 2005.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Modeling durations of syllables using neural networks,” Computer Speech and Language, vol. 21, pp. 282–295, Apr. 2007.
Article Google Scholar
K. S. Rao, “Modeling supra-segmental features of syllables using neural networks,” in Speech, Audio, Image and Biomedical Signal Processing using Neural Networks (P. B. Prasad and S. R. M. Prasanna, eds.), pp. 71–95, Springer, 2008.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Impact of constraints on prosody modeling for Indian languiages,” in 3rd International Conference on Natural Language Processing (ICON-2004), (Hyderabad, India), pp. 225–236, Dec. 2004.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Two-stage duration model for indian languages using neural networks,” in Lecture Notes in Computer Science, Neural Information Processing (Springer), pp. 1179–1185, 2004.
Google Scholar
K. S. Rao, “Application of prosody models for developing speech systems in Indian languages,” International Journal of Speech Technology, vol. 14, pp. 19–23, March 2011.
Article Google Scholar
S. R. M. Prasanna and B. Yegnanarayana, “Extraction of pitch in adverse conditions,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Montreal, Canada), May 2004.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Intonation modeling for indian languages,” Computer Speech and Language, vol. 23, pp. 240–256, Apr. 2009.
Article Google Scholar
K. S. Rao and B. Yegnanarayana, “Intonation modeling for indian languages,” in 8th Int. Conf. on Spoken Language Processing (Interspeech-2004), (Jeju Island, Korea), pp. 733–736, Oct. 2004.
Google Scholar
L. Mary, K. S. Rao, S. V. Gangashetty, and B. Yegnanarayana, “Neural network models for capturing duration and intonation knowledge for language and speaker identification,” in 8th Int. Conf. on Cognitive and Neural systems (ICCNS), (Boston, MA, USA), May 2004.
Google Scholar
L. Mary, K. S. Rao, and B. Yegnanarayana, “Neural network classifiers for language identification using syntactic and prosodic features,” in 2nd Int. Conf. Intelligent Sensing and Information Processing (ICISIP-2005), (Chennai, India), Jan. 2005.
Google Scholar
S. G. Koolagudi and K. S. Rao, “Neural network models for capturing prosodic knowledge for emotion recognition,” in 12th Int. Conf. on Cognitive and Neural systems (ICCNS), (Boston, MA, USA), May 2008.
Google Scholar
P. S. Murthy and B. Yegnanarayana, “Robustness of group-delay-based method for extraction of significant excitation from speech signals,” IEEE Trans. Speech and Audio Processing, vol. 7, pp. 609–619, Nov. 1999.
Article Google Scholar
R. Smits and B. Yegnanarayana, “Determination of instants of significant excitation in speech using group delay function,” IEEE Trans. Speech and Audio Processing, vol. 3, pp. 325–333, Sept. 1995.
Google Scholar
J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, pp. 561–580, Apr. 1975.
Article Google Scholar
A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-time signal processing. Upper Saddle River, NJ.: Prentice-Hall, 1999.
Google Scholar
W. M. Fisher, G. R. Doddington, and K. M. Goudie-Marshall, “The DARPA speech recognition database: Specifications and status,” in Proc. DARPA Workshop on speech recognition, pp. 93–99, Feb. 1986.
Google Scholar
J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-time processing of speech signals. New York, USA: Macmilan Publishing Company, 1993.
Google Scholar
R. V. Hogg and J. Ledolter, Engineering Statistics. 866 Third Avenue, New York, USA: Macmillan Publishing Company, 1987.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Prosody modification using instants of significant excitation,” IEEE Trans. Speech and Audio Processing, vol. 14, pp. 972–980, May 2006.
Article Google Scholar
K. S. Rao and B. Yegnanarayana, “Prosodic manipulation using instants of significant excitation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, April 2003.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Prosodic manipulation using instants of significant excitation,” in IEEE Int. Conf. Multimedia and Expo, (Baltimore, Maryland, USA), July 2003.
Google Scholar
T. V. Ananthapadmanabha and B. Yegnanarayana, “Epoch extraction from linear prediction residual for identification of closed glottis interval,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, pp. 309–319, Aug. 1979.
Google Scholar
A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. Englewood Cliffs, New Jersey, USA: Prentice Hall, 1975.
MATH Google Scholar
B. Yegnanarayana, S. R. M. Prasanna, and K. S. Rao, “Speech enhancement using excitation source information,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 1, (Orlando, Florida, USA), pp. 541–544, May 2002.
Google Scholar
D. Gabor, “Theory of communication,” J. IEE, vol. 93, no. 2, pp. 429–457, 1946.
Google Scholar
N. S. Krishna, H. A. Murthy, and T. A. Gonsalves, “Text-to-speech (tts) in indian languages,” in Int. Conf. Natural Language Processing, 2002.
Google Scholar
S. Srikanth, S. R. R. Kumar, R. Sundar, and B. Yegnanarayana, A text-to-speech conversion system for Indian languages based on waveform concatenation model. Technical report no.11, Project VOIS, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Mar. 1989.
Google Scholar
B. Zellner, “Fast and slow speech rate: A characterization for French,” in Proc. Int. Conf. Spoken Language Processing, (Sydney, Australia.), pp. 542–545, Dec. 1998.
Google Scholar
S. R. M. Prasanna and J. M. Zachariah, “Detection of vowel onset point in speech,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Orlando, Florida, USA), May 2002.
Google Scholar
S. V. Gangashetty, C. C. Sekhar, and B. Yegnanarayana, “Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances,” in Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing, (Chennai, India), pp. 159–164, Jan. 2004.
Google Scholar
Database for Indian languages. Speech and Vision lab, Indian Institute of Technology Madras, India, 2001.
Google Scholar
H. A. Murthy and B. Yegnanarayana, “Formant extraction from group delay function,” Speech Communication, vol. 10, pp. 209–221, Mar. 1991.
Article Google Scholar
K. S. Rao, S. R. M. Prasanna, and B. Yegnanarayana, “Determination of instants of significant excitation in speech using hilbert envelope and group delay function,” IEEE Signal Processing Letters, vol. 14, pp. 762–765, Oct. 2007.
Article Google Scholar
K. S. Rao, “Real time prosody modification,” Journal of Signal and Information Processing, Nov. 2010.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Neural network models for text-to-speech synthesis,” in 5th International Conference on Knowledge Based Computer Systems (KBCS-2004), (Hyderabad, India), pp. 520–530, Dec. 2004.
Google Scholar
K. S. Rao and B. Yegnanarayana, “Duration modification using glottal closure instants and vowel onset points,” Speech Communication, vol. 51, pp. 1263–1269, Dec. 2009.
Article Google Scholar
K. S. Rao and B. Yegnanarayana, “Voice conversion by prosody and vocal tract modification,” in 9th Int. Conf. Information Technology, (Bhubaneswar, Orissa, India), Dec 2006.
Google Scholar
K. S. Rao, “Voice conversion by mapping the speaker-specific features using pitch synchronous approach,” Computer Speech and Language, vol. 24, pp. 474–494, July 2010.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
K. Sreenivasa Rao

Authors

K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rao, K.S. (2012). Analysis of Durations of Sound Units. In: Predicting Prosody from Text for Text-to-Speech Synthesis. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1338-7_3

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1338-7_3
Published: 27 March 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1337-0
Online ISBN: 978-1-4614-1338-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics