Le traitement du signal vocal voice signal processing

Combescure, Pierre; Le Guyader, Alain; Jouvet, Denis; Sorin, Christel

doi:10.1007/BF03000774

Le traitement du signal vocal voice signal processing

Published: January 1995

Volume 50, pages 142–164, (1995)
Cite this article

Annales Des Télécommunications Aims and scope Submit manuscript

Pierre Combescure¹,
Alain Le Guyader¹,
Denis Jouvet¹ &
…
Christel Sorin¹

99 Accesses
1 Citation
Explore all metrics

Résumé

Le traitement de la parole a connu ces dernières années un formidable développement lié aux avancées technologiques des composants de traitement numérique des signaux et à la numérisation grandissante des réseaux. Cet article fournit une analyse des principales techniques qui se sont imposées récemment dans les domaines du codage, de la reconnaissance et de la synthèse de la parole. En compression de débit, ľaccent est mis sur le codage par analyse/synthèse excité par code (code-excited linear prediction celp) qui domine les recherches actuelles dans une gamme de débits allant de 4 à 16 kbit/s. En reconnaissance de parole, on insiste sur ľadaptation aux lignes téléphoniques, le rejet des entrées parasites et la détection de mots-clés, trois éléments essentiels pour augmenter la robustesse des systèmes. En synthèse de la parole à partir du texte, la technique psola (pitch synchronous overlap and add), qui a donné naissance à une nouvelle génération de systèmes de synthèse au timbre très naturel, est détaillée. Ľanalyse des tendances actuelles permet de dégager quelques axes prometteurs pour de futures recherches.

Abstract

The speech processing studies have advanced rapidly in recent years spurred on by great progresses in thevlsi technologies and in the digitalization of the networks. This paper offers an overview of the most attractive techniques which have focused the recent researchs and developments in speech coding, recognition and synthesis areas. For speech compression, the emphasis is put on a family of techniques named code-excited linear prediction (celp) which dominates current studies for rates in the range of 4 to 16 kbit/s. In terms of speech recognition, particular emphasis is placed on the following three elements which are essential in order to increase the robustness of the systems : telephone line adaptation, rejection of parasite noise and out-of-vocabulary words, and keyword spotting. In terms of text-to-speech synthesis, thepsola (pitch synchronous overlap and add) technique is outlined herein. This technique gives rise to a new generation of synthesis systems which produce speech with very natural timbre. The analysis of current tendencies for each area allows to suggest attractive directions for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Daumer (W. R.),Maitre (X.),Mermelstein (P.),Tokizawa (I.). Overview of the adpcm coding algorithm.Proc. of the IEEE Global Telecom. Conf. (1984), pp. 23.1.1–23.1.4.
Le Guyader (A.), Gilloire (A.). Codage différentiel de la parole: algorithmes de prédiction adaptative et performances.Ann. Télécommunic. (1983),38, n° 9-10, pp. 381–397.
Google Scholar
Taka (M.), Combescure (P.), Mermelstein (P.), Westaix (F.). Overview of the 64 kbit/s (7 kHz) audio coding standard.Proc. of the IEEE Global Telecom. Conf., Houston (1986), pp. 17.1.1–17.1.6.
Google Scholar
Kroon (P.), Deprettere (F.), Sluyter (R. J.). Regular pulse excitation a novel approach to effective and efficient multipulse coding of speech.IEEE Trans. ASSP (1986),34, pp. 1044–1063.
Google Scholar
Hellwig (K.), Vary (P.), Massaloux (D.), Petit (J. P.). Speech codec for the European mobile radio system. Proc.of the IEEE Global Telecom. Conf. (1989),2, pp. 1065–1069.
Google Scholar
Atal (B. S.). High quality speech at very low bit rates : multipulse and stochastically excited linear predictive coders.Proc. of the Int. Conf. on ASSP (1986), pp. 1681–1684.
Trancoso (I.), Atal (B. S.). Efficient search procedures for selecting the optimum innovation in stochastic coders.IEEE Trans. ASSP (1990),38, n° 3, pp. 385–396.
Google Scholar
Chen (J. H.), Cox (R. V.), Lin (Y. C.), Jayant (N.), Melchner (M. J.). A low-delay celp coder for the ccitt 16 kbit/s speech coding standard.IEEE J SAC (June 1992),10, n° 5, pp. 830–848.
Google Scholar
Gerson (I.),Jasiuk (M.). Vector sum excited linear prediction (vselp) speech coding at 8 kbps.Proc. the Int. Conf. on ASSP0 (1990), pp. 461–464.
Davidson (G.),Gersho (A.). Complexity reduction methods for vector excitation coding.Proc. of the Int. Conf. on ASSP (1986), pp. 3055–3058.
Le Guyader (A.),Massaloux (D.),Petit (J. P.). Robust and fast celp coding of speech signals.Proc. of the Int. Conf. on ASSP (1989), pp. 120–123.
Salami (R.),Laflamme (C.),Adoul (J. P.),Massaloux (D.). Toll quality 8 kbit/s speech coder for the personal communication system (pes).IEEE Trans. VT (Aug. 1994),43, n° 3.
Google Scholar
Di Francesco (R.). Codage algébrique de la parole: prédiction linéaire à excitation par codes ternaires.Ann. Télécommunic. (1992),47, n° 5-6, pp. 214–226.
Google Scholar
Lamblin (C.). Quantification vectorielle algébrique sphérique par le réseau de Bames-Wall: application au codage de la parole.PhD, Université de Sherbrooke, Canada (1988).
Google Scholar
Markel (J. D.), Gray (A. H.). Linear prediction of speech.Springer Verlag, Berlin, Heidelberg (1976).
MATH Google Scholar
Le Flour (E.),Petit (J. P.),Auslander (E.),Couvrat (M.). Full duplex real-time implementation of ITU G728 ldcelp speech coding recommendation and hands free controls on a single new fixed point dsp.DSP 94, Paris (oct. 1994).
Salami (R.),Laflamme (C.),Adoul (J. P.). 8 kbit/s acelp coding of speech with 10 ms speech frame : a candidate for CCITT standardization.Proc. of the Int. Conf. on ASSP (1994), pp. II–97, 11–100.
Kataoka (A.),Moriya (T.),Hayashi (S.). An 8 kbit/s speech coder based on conjugate structure celp.Proc. of the Int. Conf. on ASSP (1993), pp. II–592, II–595.
Kleijn (W. B.), Krasinsky (D. J.), Ketchum (R. H.). Fast methods for the celp coding algorithm.IEEE Trans. ASSP (1990),38, n° 8, pp. 1330–1342.
Google Scholar
Kroon (P.),Atal (B. S.). Pitch predictors with high temporal resolution.Proc. of the Int. Conf. on ASSP (1990), pp. 661–664.
Mahieux (Y.). High quality audio transform coding at 64 kbit/s.Ann. Télécommunic. (1992),47, n° 3-4, pp. 95–106.
Google Scholar
Moreau (N.),Dymarski (P.). Successive orthogonalizations in the multistage celp coder.Proc. of the int. Conf. on ASSP (1992), pp. 1–61, 1–64.
Lozach (B.). Codage de la parole sous-bandes/cELP à codes imbriqués et largeur de bande transmise flexible (16-24-32 kbit/s).Thèse doctorat de ľUniversité de Rennes I (1993).
Rabiner (L. R). The role of voice processing in telecommunications.Proc. 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, Kyoto, Japon (1994), pp. 1–8.
Google Scholar
Lennig (M.), Sharp (D.), Gupta (V.), Kenny (P.), Precoda (K.). Flexible vocabulary recognition of speech over the telephone.Proc. 1st IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, Piscataway, NJ (1992), pp. VIII.2.1-3.
Google Scholar
Aust (H.), Oerder (M.), Seide (F.), Steinbiss (V.). Experience with the Philips automatic train timetable information system.Proc. 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, Kyoto, Japon (1994) pp. 67–72.
Google Scholar
Athimon (C.), Bigorgne (D.), Cherbonnel (B.), Dubois (D.), Gagnoulet (C.), Jouvet (D.), Marzio (H.), Monne (J.), Py (S.), Sorin (C.), Toularhoat (M.). Operational and experimental French telecommunication services using cnet speech recognition and text-to-speech synthesis.Proc. 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, Kyoto, Japon (1994), pp. 27–32.
Google Scholar
Vysotsky (G. J.). VoiceDialing — The first speech recognition based telephone service delivered to customer’s home.Proc. 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, Kyoto, Japon (1994), pp. 149–152.
Google Scholar
Mercier (G.), Gagnoulet (C.), Vives (R.), Vaissiere (J.). A multipurpose speech understanding system.Proc. Int. Conf. on ASSP, Hartford (1977), pp. 815–818.
Google Scholar
Klatt (D. H.). Review of the arpa speech understanding project.JASA (1977),62, n° 6, pp. 1345–1366.
Google Scholar
Vintsyuk (T. K.). Speech discrimination by dynamic programming.Kibernetica (1968),4, p. 81.
Article MathSciNet Google Scholar
Baker (J.). The dragon system —an overview.IEEE Trans. ASSP (1975),23, pp. 24–29.
Article Google Scholar
Jelinek (F.), Bahl (L. R.), Mercer (R. L.). The design of a linguistic statistical decoder for the recognition of continuous speech.IEEE Trans. IT (1975),21, pp. 250–256.
Article MATH Google Scholar
Rabiner (L. R.), Levinson (S. E.), Sondhi (M. M.). On the application of vector quantization and hidden Markov models to speaker-independent isolated word recognition.Bell Syst. Techn. J. (1983),62, n° 4, pp. 1075–1105.
MathSciNet Google Scholar
Schwartz (R.),Chow (Y.),Roucos (S.),Krasner (M.),Makhoul (J.). Improved hidden Markov modeling of phonemes for continuous speech recognition.Proc. IEEE Int. Conf. on ASSP (1984), pp. 35.6.1–4.
Lee (K. F.), Hon (H. W.), Hwang (M. Y.), Mahajan (S.), Reddy (R.). The sphinx speech recognition system.Proc. IEEE Int. Conf. on ASSP, Glasgow, UK (1989), pp. 445–448.
Google Scholar
Jouvet (D.), Bartkova (K.), Monne (J.). On the modelization of allophones in an hmm based speech recognition system.Proc. Eurospeech’91, Gúnes, Italie (1991), pp. 923–926.
Google Scholar
Mokbel (C.), Paches-Leal (P.), Jouvet (D.), Monne (J.). Compensation of telephone line effects for robust speech recognition.Proc. Int. Conf. on Spoken Language Processing, Yokohama, Japon (1994), pp. 987–990.
Google Scholar
Mokbel (C.), Monne (J.), Jouvet (D.). On-line adaptation of a speech recognizer to variations in telephone line conditions.Proc. Eurospeech’93, Berlin (1993), pp. 1247–1250.
Google Scholar
Hirsch (H. G.), Meyer (P.), Ruehl (H.). Improved speech recognition using high-pass filtering of subband envelopes.Proc. Eurospeech’91, Gênes, Italie (1991), pp. 413–416.
Google Scholar
Hermansky (H.), Morgan (N.), Bayya (A.), Kohn (P.). Compensation for the effect of the communication channel in auditory like analysis of speech (Rasta-PLP).Proc. Eurospeech’91, Gênes, Italie (1991), pp. 1367–1370.
Google Scholar
Cerf-Danon (H.), De Gennaro (S.), Ferreti (M.), Gonzalez (J.), Keppel (E.), tangora — a large vocabulary speech recognition system for five languages.Proc. Eurospeech’91, Gênes, Italie (1991), pp. 183–192.
Google Scholar
Baker (J. M.). Dictation, directories and data bases; emerging PC applications for large vocabulary speech recognition.Proc. Eurospeech’ 93, Berlin (1993), pp. 3–10.
Google Scholar
Gauvain (J.-L.), Lamel (L. F.), Adda (G.), Adda-Decker (M.). Speaker-independent continuous speech dictation.Proc. Eurospeech’93, Berlin (1993), pp. 125–128.
Google Scholar
Emerard (F.), Graillot (P.). Sahara II: speech prosthesis for the non-speaking handicapped.Proc. of the 4th Annual Conference on Rehabilitation Engineering, Washington, DC (1981).
Google Scholar
Sorin (C.). Towards high-quality multilingual text-to-speech.Progress and Prospects of Speech Research and Technology,H. Nieman Editor, Infix Publishing Company, Sankt Augustin (1994).
Google Scholar
Schmidt (M.), Fitt (S.), Scott (C.), Jack (M.). Phonetic transcription standards for European names (onomastica).Proc. Eurospeech’93, Berlin (1993),1, pp. 279–283.
Google Scholar
Emerard (F.),Mortamet (L.),Cozannet (A.). Prosodic processing in a text-to-speech synthesis system using a database and learning procedures.Talking Machines,G. Bailly andC. Benoît (eds), Amsterdam,North Holland Publishing Company (1992), pp. 225–254.
Traber (C.). Fo generation with a database of natural Fo patterns and with a neural network.Talking Machines,G. Bailly andC. Benoît (eds), North Holland (1992), pp. 287–304.
Klatt (D. H.). Review of text-to-speech conversion for English.JASA (1987),82, pp. 737–793.
Google Scholar
Bigorgne (D.), Boeffard (O.), Cherbonnel (B.), Emerard (F.), Larreur (D.), Le Saint-Milon (J. L.), Métayer (I.), Sorin (C.), White (S.). Multilingual psola text-to-speech system.Proc. ICASSP’93, Minneapolis (Apr. 1993),2, pp. 187–190.
Google Scholar
Atal (B. S.), Hanauer (S. L.). Speech analysis and synthesis by linear prediction of the speech wave.JASA (1971),50, pp. 637–655.
Google Scholar
Hamon (C.). Procédé et dispositif de synthèse de la parole par addition/recouvrement de formes ďondes.Brevet français n° 88 11 517 acquis en France, Canada, USA. En cours dans ďautres pays.
Moulines (E.), Charpentier (F.). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones.Speech Communication (1990),9, pp. 453–467.
Article Google Scholar
Fellbaum (K.), Klaus (H.), Sotscheck (J.). Hörversuche zur Beurteilung der Sprachqualität von Sprachsynthesesystemen für die deutsche Sprache.Proceedings of the DAGA 94 Vorkolloquium, Dresden (March 1994).
Google Scholar
Boeffard (O.), Cherbonnel (B.), Emerard (F.), White (S.). Automatic segmentation and quality evaluation of speech units inventories for concatenation-base multilingual psola text-to-speech systems.Proc. Eurospeech’93, Berlin (Sep. 1993),2, pp. 1449–1452.
Google Scholar
Llisterri (J.),Poch-Olive (D.). Phonetics and phonology of speaking styles.Special Issue of Speech Communication (Oct. 1992),11, n° 4-5.
Abe (M.). Statistical analysis of the acoustic and prosodic characteristics of different speaking styles.Proc. Eurospeech’93, Berlin (Sep. 1993),3, pp. 2107–2110.
Google Scholar
Valbret (H.), Moulines (E.), Tubach (J. P.). Voice transformation using psola technique.Speech Communication (1992),11, pp. 175–187.
Article Google Scholar
Serra (X.), Smith (J.). Spectral modeling synthesis: a sound analysis/synthesis system based on a deterministic plus stochastic decomposition.Computer Music Journal (Winter 1990),14, n° 4, pp. 12–24.
Article Google Scholar
Laroche (J.), Stylianou (Y.), Moulines (E.), hns: speech modification based on a harmonic + noise model.Proc. ICASSP’93, Minneapolis (1993).
Google Scholar
Boeffard (O.), Violaro (F.). Improving the robustness of the psola synthesis scheme for large prosodie variations.Second ESCAIIEEE Workshop on Speech Synthesis, Monhonk, NJ (Sep. 1994).
Google Scholar
Van Coile (B.), De Zitter (M.), Van Tichelen (M.), Vorster-mans (M.). Prosody transplantations in text-to-speech: applications and tools.Proc. Second ESCAIIEEE Workshop on Speech Synthesis, Monhonk, NJ (Sep. 1994).
Google Scholar

Download references

Author information

Authors and Affiliations

France Télécom, cnet laa, route de Trégastel, F-22301, Lannion, France
Pierre Combescure, Alain Le Guyader, Denis Jouvet & Christel Sorin

Authors

Pierre Combescure
View author publications
You can also search for this author in PubMed Google Scholar
Alain Le Guyader
View author publications
You can also search for this author in PubMed Google Scholar
Denis Jouvet
View author publications
You can also search for this author in PubMed Google Scholar
Christel Sorin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Combescure, P., Le Guyader, A., Jouvet, D. et al. Le traitement du signal vocal voice signal processing. Ann. Télécommun. 50, 142–164 (1995). https://doi.org/10.1007/BF03000774

Download citation

Issue Date: January 1995
DOI: https://doi.org/10.1007/BF03000774

Mots clés

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Le traitement du signal vocal voice signal processing

Résumé

Abstract

Access this article

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Mots clés

Key words

Search

Navigation