Perceptive, Non-linear Speech Processing and Spiking Neural Networks

Rouat, Jean; Pichevar, Ramin; Loiselle, Stéphane

doi:10.1007/11520153_14

Jean Rouat²²,
Ramin Pichevar²² &
Stéphane Loiselle²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Included in the following conference series:

International School on Neural Networks, Initiated by IIASS and EMFCSC

1193 Accesses
4 Citations
6 Altmetric

Abstract

Source separation and speech recognition are very difficult in the context of noisy and corrupted speech. Most conventional techniques need huge databases to estimate speech (or noise) density probabilities to perform separation or recognition. We discuss the potential of perceptive speech analysis and processing in combination with biologically plausible neural network processors. We illustrate the potential of such non-linear processing of speech on a source separation system inspired by an Auditory Scene Analysis paradigm. We also discuss a potential application in speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Widrow, B., et al.: Adaptive noise cancelling: Principles and applications. Proceedings of the IEEE 63(12) (1975)
Google Scholar
Kaneda, Y., Ohga, J.: Adaptive microphone-array system for noise reduction. IEEE Tr. on ASSP 34(6), 1391–1400 (1986)
Article Google Scholar
Hyvärinen, Karhunen, Oja: Independent Component Analysis. Wiley, Chichester (2001)
Book Google Scholar
Van Compernolle, D., Ma, W., Xie, F., Van Diest, M.: Speech recognition in noisy environments with the aid of microphone array. Speech Communication 9(5-6), 433–442 (1990)
Article Google Scholar
Seltzer, M.L., Raj, B., Stern, R.M.: Speech recognizer-based microphone array processing for robust hands-free speech recognition. In: ICASSP, vol. I, pp. 897–900 (2002)
Google Scholar
Brandstein, M.S., Ward, D.B. (eds.): Microphone Arrays: Signal Processing Techniques and Applications. Springer, Heidelberg (2001)
Google Scholar
Haykin, S.: Adaptive Filter Theory. Prentice-Hall, Englewood Cliffs (2002)
Google Scholar
Sánchez-Bote, J.L., González-Rodríguez, J., Ortega-Garcían, J.: A real-time auditory-based microphone array assessed with E-RASTI evaluation proposal. In: ICASSP, vol. V, pp. 447–450 (2003)
Google Scholar
Nakadai, K., Okuno, H.G., Kitano, H.: Auditory fovea based speech separation and its application to dialog system. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2002, pp. 1314–1319 (2002)
Google Scholar
Valin, J.-M., Michaud, F., Rouat, J., Létourneau, D.: Robust sound source localization using a microphone array on a mobile robot. In: IEEE/RSJ-Int. Conf. on Intelligent Robots & Systems (October 2003)
Google Scholar
Valin, J.M., Rouat, J., Michaud, F.: Microphone array post-filter for separation of simultaneous non-stationary sources. In: IEEE Int. Conf. on Acoustics Speech Signal Processing, May 17–21 (2004)
Google Scholar
Potamitis, I., Tremoulis, G., Fakotakis, N.: Multi-speaker DOA tracking using interactive multiple models and probabilistic data association. In: EUROSPEECH, September 2003, pp. 517–520 (2003)
Google Scholar
Schreiner, C.E., Urbas, J.V.: Representation of amplitude modulation in the auditory cortex of the cat. I. the anterior auditory filed (AAF). Hearing research 21, 227–241 (1986)
Article Google Scholar
Schreiner, C.E., Langner, G.: Periodicity coding in the inferior colliculus of the cat. II, topographical organization. Journal of Neurophysiology 60, 1823–1840 (1988)
Google Scholar
Robles, L., Ruggero, M.A., Rich, N.C.: Two-tone distortion in the basilar membrane of the cochlea. Nature 349, 413 (1991)
Article Google Scholar
Evans, E.F.: Auditory processing of complex sounds: An overview. In: Phil. Trans. Royal Society of London, pp. 1–12. Oxford Press, Oxford (1992)
Google Scholar
Ruggero, M.A., Robles, L., Rich, N.C., Recio, A.: Basilar membrane responses to two-tone and broadband stimuli. In: Phil. Trans. Royal Society of London, pp. 13–21. Oxford Press, Oxford (1992)
Google Scholar
Arnott, R.H., Wallace, M.N., Shackleton, T.M., Palmer, A.R.: Onset neurones in the anteroventral cochlear nucleus project to the dorsal cochlear nucleus. JARO 5(2), 153–170 (2004)
Article Google Scholar
Pressnitzer, D., Meddis, R., Delahaye, R., Winter, I.M.: Physiological correlates of comodulation masking release in the mammalian VCN. J. Neuroscience 21, 6377–6386 (2001)
Google Scholar
Giguere, C., Woodland, P.C.: A computational model of the auditory periphery for speech and hearing research. JASA, 331–349 (1994)
Google Scholar
Liberman, M.C., Puria, S., Guinan Jr., J.J.: The ipsilaterally evoked olivocochlearreflex causes rapid adaptation of the 2f1-f2 distortion product otoacoustic emission. JASA 99, 2572–3584 (1996)
Google Scholar
Kim, S., Robert Frisina, D., Frisina, R.D.: Effects of Age on Contralateral suppression of Distorsion Product Otoacoustic Emissions in Human Listeners with Normal Hearing. Audiology Neuro Otology 7, 348–357 (2002)
Article Google Scholar
Henkel, C.K.: The Auditory System. In: Haines, D.E. (ed.) Fondamental Neuroscience, Churchill Livingstone (1997)
Google Scholar
Tang, P., Rouat, J.: Modeling neurons in the anteroventral cochlear nucleus for amplitude modulation (AM) processing: Application to speech sound. In: Proc. Int. Conf. on Spok. Lang. Proc., October 1996, pp. Th.P.2S2.2 (1996)
Google Scholar
Harding, S., Meyer, G.: Multi-resolution auditory scene analysis: Robust speech recognition using pattern-matching from a noisy signal. In: EUROSPEECH, September 2003, pp. 2109–2112 (2003)
Google Scholar
Bregman, A.: Auditory Scene Analysis. MIT Press, Cambridge (1994)
Google Scholar
Beauvois, M.W., Meddis, R.: A computer model of auditory stream segregation. The Quaterly Journal of Experimental Psychology, 517–541 (1991)
Google Scholar
Wang, D., Brown, G.J.: Separation of speech from interfering sounds based on oscillatory correlation. IEEE Tr. on Neural Networks 10(3), 684–697 (1999)
Article MathSciNet Google Scholar
Cooke, M., Ellis, D.: The auditory organization of speech and other sources in listeners and computational models. Speech Communication, 141–177 (2001)
Google Scholar
Handbook of Neural Computation. IOP Publishing Ltd and Oxford University Press (1997)
Google Scholar
Maass, W., Bishop, C.M.: Pulsed Neural Networks. MIT Press, Cambridge (1998)
MATH Google Scholar
Wermter, S., Austin, J., Willshaw, D.: Emergent Neural Computational Architectures Based on Neuroscience. In: Towards Neuroscience-Inspired Computing, Springer, Heidelberg (2001)
Google Scholar
Gerstner, W.: Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge (2002)
MATH Google Scholar
DeWeese, M.R., Zador, A.M.: Binary coding in auditory cortex. In: NIPS (December 2002)
Google Scholar
Thorpe, S., Fize, D., Marlot, C.: Speed of processing in the human visual system. Nature 381(6582), 520–522 (1996)
Article Google Scholar
Thorpe, S., Delorme, A., Van Rullen, R.: Spike-based strategies for rapid processing. Neural Networks 14(6-7), 715–725 (2001)
Article Google Scholar
Natschläger, T., Maass, W.: Information dynamics and emergent computation in recurrent circuits of spiking. In: NIPS (December 2003)
Google Scholar
Delgutte, B.: Representation of speech-like sounds in the discharge patterns of auditory nerve fibers. JASA 68, 843–857 (1980)
Google Scholar
Frisina, R.D., Smith, R.L., Chamberlain, S.C.: Differential encoding of rapid changes in sound amplitude by second-order auditory neurons. Experimental Brain Research 60, 417–422 (1985)
Article Google Scholar
Popper, A.N., Fay, R. (eds.): The Mammalian Auditory Pathway: Neurophysiology. Springer, Heidelberg (1992)
Google Scholar
Hewitt, M., Meddis, R.: A computer model of amplitude-modulation sensitivity of single units in the inferior colliculus. Journal of the Acoustical Society of America 95(4), 2145–2159 (1994)
Article Google Scholar
Zotkin, D.N., Shamma, S.A., Ru, P., Duraiswami, R., Davis, L.S.: Pitch and timbre manipulations using cortical representation of sound. In: ICASSP, vol. V, pp. 517–520 (2003)
Google Scholar
Maass, W.: Networks of spiking neurons: The third generation of neural network models. Neural Networks 10(9), 1659–1671 (1997)
Article Google Scholar
Maass, W., Sontag, E.D.: Neural systems as nonlinear filters. Neural Computation 12(8), 1743–1772 (2000)
Article Google Scholar
Natschläger, T., Maass, W., Zador, A.: Efficient temporal processing with biologically realistic dynamic synapses. Network: Computation in Neural Systems 12(1), 75–87 (2001)
MATH Google Scholar
Rieke, F., Warland, D., de Ruyter van Steveninck, R., Bialek, W.: SPIKES Exploring the Neural Code. MIT Press, Cambridge (1997)
Google Scholar
Sejnowski, T.J.: Time for a new neural code? Nature 376, 21–22 (1995)
Article Google Scholar
DeWeese, M.: Optimization principles for the neural code. Network: Computation in Neural Systems 7(2), 325–331 (1996)
Article MATH Google Scholar
Chechik, G., Tishby, N.: Temporally dependent plasticity: An information theoretic account. In: NIPS (2000)
Google Scholar
Ho, T.V., Rouat, J.: Novelty detection based on relaxation time of a network of integrate–and–fire neurons. In: Proc. of the IEEE,INNS Int. Joint Conf. on Neural Networks, May 1998, vol. 2, pp. 1524–1529 (1998)
Google Scholar
Borisyuk, R., Denham, M., Hoppensteadt, F., Kazanovich, Y., Vinogradova, O.: Oscillatory model of novelty detection. Network: Computation in Neural Systems 12(1), 1–20 (2001)
Google Scholar
Panchev, C., Wermter, S.: Spiking-time-dependent synaptic plasticity: From single spikes to spike trains. In: Computational Neuroscience Meeting, July 2003, pp. 494–506. Springer, Heidelberg (2003)
Google Scholar
Perrinet, L.: Comment déchifrer le code impulsionnel de la Vision? Étude du flux parallèle, asynchrone et épars dans le traitement visuel ultra-rapide. PhD thesis, Université Paul Sabatier (2003)
Google Scholar
Milner, P.M.: A model for visual shape recognition. Psychological Review 81, 521–535 (1974)
Article Google Scholar
Malsburg, C.v.d.: The correlation theory of brain function. Technical Report Internal Report 81-2, Max-Planck Institute for Biophysical Chemistry (1981)
Google Scholar
Malsburg, C.v.d., Schneider, W.: A neural cocktail-party processor. Biol. Cybern., 29–40 (1986)
Google Scholar
Malsburg, C.v.d.: The what and why of binding: The modeler’s perspective. Neuron, 95–104 (1999)
Google Scholar
Bohte, S.M., Poutré, H.L., Kok, J.N.: Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer RBF networks. IEEE Tr. on neural networks 13(2), 426–435 (2002)
Article Google Scholar
Schwartz, J.L., Escudier, P.: Auditory processing in a post-cochlear neural network: Vowel spectrum processing based on spike synchrony. In: EUROSPEECH, pp. 247–253 (1989)
Google Scholar
Hopfield, J.: Pattern recognition computation using action potential timing for stimulus representation. Nature 376, 33–36 (1995)
Article Google Scholar
Levy, N., Horn, D., Meilijson, I., Ruppin, E.: Distributed synchrony in a cell assembly of spiking neurons. Neural Networks 14(6-7), 815–824 (2001)
Article Google Scholar
Wang, D.L., Terman, D.: Image segmentation based on oscillatory correlation. Neural Computation 9, 805–836 (1997)
Article Google Scholar
Alkon, D.L., Blackwell, K.T.: Pattern recognition by an artificial network derived from biological neuronal systems. Biological Cybernetics 62, 363–376 (1990)
Article MATH Google Scholar
Blackwell, K.T., Vogl, T.P., Hyman, S.D., Barbour, G.S., Alkon, D.L.: A new approach to hand-written character recognition. Pattern Recognition Journal 25(6), 655–666 (1992); Implémenté par M. Garcia.
Article Google Scholar
Rouat, J., Garcia, M.: A prototype speech recogniser based on associative learning and nonlinear speech analysis. In: Rosenthal, Okuno (eds.) Computational Auditory Scene Analysis, pp. 13–26. L. Erlbaum, Mahwah (1998)
Google Scholar
Van Rullen, R., Thorpe, S.J.: Surfing a spike wave down the ventral stream. Vision Research 42(23), 2593–2615 (2002)
Article Google Scholar
Sameti, H., Sheikhzadeh, H., Deng, L., Brennan, R.L.: HMM based strategies for enhancement of speech signals embedded in nonstationary noise. IEEE Trans. on Speech and Audio Processing, 445–455 (1998)
Google Scholar
Roweis, S.T.: One microphone source seperation. In: NIPS, Denver, USA (2000)
Google Scholar
Roweis, S.T.: Factorial models and refiltering for speech separation and denoising. In: Eurospeech (2003)
Google Scholar
Reyes-Gomez, M.J., Raj, B., Ellis, D.: Multi-channel source separation by factorial HMMs. In: ICASSP (2003)
Google Scholar
Jang, G., Lee, T.: A maximum likelihood approach to single-channel source separation. IEEE-SPL, 168–171 (2003)
Google Scholar
Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. On Neural Networks, 1135–1150 (2004)
Google Scholar
Haines, D.E. (ed.): Fondamental Neuroscience. Churchill Livingstone (1997)
Google Scholar
Atlas, L., Shamma, S.A.: Joint acoustic and modulation frequency. EURASIP J. on Appl. Sig. Proc. (7), 668–675 (2003)
Google Scholar
Meyer, G., Yang, D., Ainsworth, W.: Applying a model of concurrent vowel segregation to real speech. In: Greenberg, S., Slaney, M. (eds.) Computational models of auditory function, pp. 297–310 (2001)
Google Scholar
Rouat, J.: Spatio-temporal pattern recognition with neural networks: Application to speech. In: Gerstner, W., Hasler, M., Germond, A., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 43–48. Springer, Heidelberg (1997)
Google Scholar
Rouat, J., Liu, Y.C., Morissette, D.: A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Comm. 21, 191–207 (1997)
Article Google Scholar
Plante, F., Meyer, G., Ainsworth, W.: Improvement of speech spectrogram accuracy by the method of reassignment. IEEE Trans. on Speech and Audio Processing, 282–287 (1998)
Google Scholar
Gabbiani, F., Krapp, H., Koch, C., Laurent, G.: Multiplicative computation in a visual neuron sensitive to looming. Nature 420, 320–324 (2002)
Article Google Scholar
Pena, J.L., Konishi, M.: Auditory spatial receptive fields created by multiplication. Science 292, 294–252 (2001)
Google Scholar
Andersen, R.A., Snyder, L.H., Bradley, D.C., Xing, J.: Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Ann. Rev. Neurosci. 20, 303 (1997)
Article Google Scholar
Pichevar, R. (2004), http://www-edu.gel.usherbrooke.ca/pichevar/
Rouat, J. (2004), http://www.gel.usherb.ca/rouat
Hunt, M.J., Lefebvre, C.: Speech recognition using an auditory model with pitch-synchronous analysis. In: ICASSP, New York, pp. 813–816. IEEE, Los Alamitos (1987)
Google Scholar
Hunt, M.J., Lefebvre, C.: Speaker dependent and independent speech recognition experiments with an auditory model. In: IEEE ICASSP, New York, pp. 215–218. IEEE, Los Alamitos (1988)
Google Scholar
Seneff, S.: A joint synchrony/mean-rate model of auditory speech processing. Journal of Phonetics 16(1), 55–76 (1988)
Google Scholar
McEachern, R.: How the ear really works. In: IEEE Int. Symp. Time-frequency and Time-Scale Analysis, vol. 10 (1992)
Google Scholar
McEachern, R.: Hearing it like it is: Audio signal processing the way the ear does it. DSP Applications, 35–47, 02 (1994)
Google Scholar
Van Immerseel, L.: Een Functioneel Gehoormodel Voor de Analyse Van Spraak Bij Spraakherkenning. PhD thesis, 05 (1993)
Google Scholar
Ghitza, O.: Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE TrSAP 2(1), 115–132 (1994)
Google Scholar
Sandhu, S., Ghitza, O.: A comparative study of mel cepstra and EIH for phone classification under adverse conditions. In: ICASSP, pp. 409–412 (1995)
Google Scholar
Slaney, M., Lyon, R.F.: A perceptual pitch detector. In: ICASSP, pp. 357–360, 03 (1990)
Google Scholar
Slaney, M., Lyon, R.F.: On the importance of time - a temporal representation of sound. In: Visual Representations of Speech Signals, pp. 95–116. John Wiley & Sons, Chichester (1993)
Google Scholar
Ainsworth, W., Meyer, G.: Speech analysis by means of a physiologically-based model of the cochlear nerve and cochlear nucleus. In: ESCA ETRW Visual Representations of Speech Signals, Sheffield, pp. 119–124. John Wiley & Sons, Chichester (1993)
Google Scholar
Patterson, R.D.: The sound of a sinusoid I: Spectral models. JASA 96(3), 1409–1418 (1994)
Google Scholar
Patterson, R.D.: The sound of a sinusoid II: Spectral models. JASA 96(3), 1419–1428 (1994)
Google Scholar
Hunt, M.J., Lefèbvre, C.: A comparison of several acoustic representations for speech recognition with degraded and undegraded speech. In: ICASSP, May 1989, pp. 262–265 (1989)
Google Scholar
Hunt, M.J.: Spectral signal processing for asr. In: Proc. IEEE International Workshop on Automatic Speech Recognition and Understanding (ASRU), December 12-15 (1999)
Google Scholar
Loiselle, S.: Exploration de réseaux de neurones à décharges dans un contexte de reconnaissance de parole. Master’s thesis, Université du Québec à Chicoutimi (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Université de Sherbrooke,
Jean Rouat, Ramin Pichevar & Stéphane Loiselle

Authors

Jean Rouat
View author publications
You can also search for this author in PubMed Google Scholar
Ramin Pichevar
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Loiselle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS LTCI/TSI Paris, 46 rue Barrault, 75634, Paris Cedex 13, France
Gérard Chollet
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, SA, Italy
Anna Esposito
Escola Universitària Politècnica de Mataró, Universitat Politècnica de Catalunya, Barcelona, Spain
Marcos Faundez-Zanuy
Dipartimento di Fisica “E.R. Caianiello”, Università degli Studi di Salerno, Via S. Allende, 84081, Baronissi, SA, Italy
Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rouat, J., Pichevar, R., Loiselle, S. (2005). Perceptive, Non-linear Speech Processing and Spiking Neural Networks. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_14

Download citation

DOI: https://doi.org/10.1007/11520153_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics