CNN: A speaker recognition system using a cascaded neural network

  • M. Zaki
  • A. Ghalwash
  • A. A. Elkouny


This work includes the design and implementation of both conventional, and neural network approaches to recognition of the speakers templates which are introduced to the system via a voice master card and preprocessed before extracting the features used in the recognltion. The conclusion is that the system performance in case of neural network is better than that of the conventional one, achieving a smooth degradation when dealing with nolsy patterns and higher performance when dealing with noise-free patterns.

Key Words

Speaker recognition neural network linear prediction coding cepstrum analysis unsupervised learning Kohonen's self organising map 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. R. Dogginton, “Speaker Recognition Identifying People by Their Voices,”Proc IEEE, vol. 73, 1985.Google Scholar
  2. 2.
    A. Blum,Neural Networks in C++, An Object Oriented Framework for Building Connectionist Systems, New York: John Wiley & Sons, 1992.Google Scholar
  3. 3.
    P. Lippman, “An Introduction to Computing with Neural Nets,”IEEE ASSP, vol 4, 1987, pp. 4–22.Google Scholar
  4. 4.
    P. D. Wasserman,Neural Computing Theory and Practice, New York: Van Nostrand rainhold, 1989.Google Scholar
  5. 5.
    D. Rumelhart, and J. McClelland and the PDP Research Group,Parallel Distributed Processing, vol(s) 1 and 2, MIT Press, 1986.Google Scholar
  6. 6.
    R. Furui, “Cepstral Analysis Techniqus For Automatic Speaker Verification,”IEEE Trans ASSP, vol. 29, 1981, pp. 254–272.Google Scholar
  7. 7.
    S. Atal, “Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification,”J. Accoust. Soc. Amer., vol. 55, no. 6, 1974.Google Scholar
  8. 8.
    S. Atal, “Automatic Recognition of Speakers from Their Voices,”Proc. IEEE, vol. 64, 1976, pp. 460–475.Google Scholar
  9. 9.
    V. Oppenheim and R. W. Schafer, “Homomorphic Analysis of Speech,”IEEE Trans, Audio Eleclroacoust, vol. Au-16, 1988, pp. 221–225.Google Scholar
  10. 10.
    T. Kohonen,Self Organization and Associative Memory, Springer-Verlag, 1984.Google Scholar
  11. 11.
    Z. Huang and A. Kuh, “A Combined Self-Organizing Feature Map and Multilayer Perceptron for Isolated Word Recogition,”IEEE Trans. On Signal Processing, vol. 40, 1992, no. 11.Google Scholar
  12. 12.
    A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, Phoneme Recognition “Using Time Delay Neural Networks,”IEEE Trans, Acoust, Speech, Signal Processing, vol. 37, 1989, pp. 328–339.Google Scholar
  13. 13.
    H. Bourland and C. Wellekens, “Multilayer Perceptrons and Automatic Speech Recognition,”Proc. IEEE ICNN San Diego, CA, 1987, IV-407.Google Scholar
  14. 14.
    M. Kammerer and W. Kupper, “Experiments for Isolated Word Recognition with Single and Two-layer Perceptrons,”Neural Networks, vol. 3, no. 6, pp. 693–706, 1990.CrossRefGoogle Scholar
  15. 15.
    J. Sejnowski and C. R. Rosenberg, “NETtalk: A Parallel Network that Learns to Read Aloud,”John Hopkins University Department of Electrical Engineering and Computer Science, Tech Report—8601, 1986.Google Scholar
  16. 16.
    J. Sejnowski and C. R. Rosenberg, “Parallel Networks that Learn to Pronounce English Text,”Complex Systems, vol. 1 1987, pp. 145–168.Google Scholar
  17. 17.
    B. P. Yushas, M. H. Goldstein, Jr. Terrence, and J. Sejnowski, “Integration of Acoustic and Visual Speech Signals Using Neural Networks,”IEEE Communications Magazine, 1989.Google Scholar
  18. 18.
    R. Beale and T. Jackson,Neural Computing: An Introduction, Bristol: Adam Hilger, 1990.Google Scholar
  19. 19.
    K. Li and G. W. Hughes, “Talker Differences as They Appear in Correlation Matrices of Continuous Speech Spectra,”JASA, vol. 55, no. 4, 1974, pp. 833–837.Google Scholar
  20. 20.
    H. Wakita, “Residual Energy of Linear Prediction Applied to Vowel and Speaker Recognition,”IEEE Trans. on Acoustic, Speech andSignal Processing, vol. ASSP-24, no. 3, 1976, pp. 270–271.Google Scholar
  21. 21.
    K. Li and E. H. Wrench, “An Approach to Text-Independent Speaker Recognition with Short Utterances,”ICASSP-83, 1983, pp. 555–558.Google Scholar
  22. 22.
    M. R. Sambur, “Speaker Recognition Using Orthogonal Linear Prediction,”IEEE Trans. on Acoustic, Speech and Signal Processing, vol. ASSP-24, no. 4, 1976, pp. 283–289.Google Scholar
  23. 23.
    R. Schwartz, “The Application of Probability Density Estimation to Text-Independent Speaker Identification,”ICASSP-82, 1982, pp. 1649.Google Scholar
  24. 24.
    J. J. Wolf, “Further Investigation of Probabilistic Methods for Text-Independent Speaker Recognition System,”ICASSP-81, 1981, pp. 193–196.Google Scholar
  25. 25.
    I. Boardman, M. Cohen, and S. Grossberg, “Variable Rate Working Memories for Phonetic Categorization and Invariant Speech Perception,”World Congress on Neural Networks, vol. 3, Portland, Oregon, pp. 2–5, 1993.Google Scholar
  26. 26.
    G. A. Carpenter and K. K. Govindarajan, “Evaluation of Speaker Normalization Methods for Vowel Recognition Using Fuzzg ARTMAP and K-NN,”World Congress on Neural Networks, vol. 3, Portland, Oregon, pp. 10–15, 1993.Google Scholar
  27. 27.
    D. Albesano, R. Gemello, and F. Mana, “Recurrent Network Automata for Speech Recognition,”World Congress on Neural Networks, vol. 3, Portland, Oregon, pp. 16–19, 1993.Google Scholar
  28. 28.
    A. Mellouk and P. Gallinari, “Continuous Speech Recognition by Neural Spectrum Prediction Systems,”World Congress on Neural Networks, vol. 3, Portland, Oregon, pp. 28–31, 1993.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • M. Zaki
    • 1
  • A. Ghalwash
    • 2
  • A. A. Elkouny
    • 3
  1. 1.Faculty of EngineeringAl-Azhar UniversityNasr City, CairoEgypt
  2. 2.Military Technical CollegeEgypt
  3. 3.Air Defense CollegeEgypt

Personalised recommendations