Abstract
Source separation and speech recognition are very difficult in the context of noisy and corrupted speech. Most conventional techniques need huge databases to estimate speech (or noise) density probabilities to perform separation or recognition. We discuss the potential of perceptive speech analysis and processing in combination with biologically plausible neural network processors. We illustrate the potential of such non-linear processing of speech on a source separation system inspired by an Auditory Scene Analysis paradigm. We also discuss a potential application in speech recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Widrow, B., et al.: Adaptive noise cancelling: Principles and applications. Proceedings of the IEEE 63(12) (1975)
Kaneda, Y., Ohga, J.: Adaptive microphone-array system for noise reduction. IEEE Tr. on ASSP 34(6), 1391–1400 (1986)
Hyvärinen, Karhunen, Oja: Independent Component Analysis. Wiley, Chichester (2001)
Van Compernolle, D., Ma, W., Xie, F., Van Diest, M.: Speech recognition in noisy environments with the aid of microphone array. Speech Communication 9(5-6), 433–442 (1990)
Seltzer, M.L., Raj, B., Stern, R.M.: Speech recognizer-based microphone array processing for robust hands-free speech recognition. In: ICASSP, vol. I, pp. 897–900 (2002)
Brandstein, M.S., Ward, D.B. (eds.): Microphone Arrays: Signal Processing Techniques and Applications. Springer, Heidelberg (2001)
Haykin, S.: Adaptive Filter Theory. Prentice-Hall, Englewood Cliffs (2002)
Sánchez-Bote, J.L., González-Rodríguez, J., Ortega-Garcían, J.: A real-time auditory-based microphone array assessed with E-RASTI evaluation proposal. In: ICASSP, vol. V, pp. 447–450 (2003)
Nakadai, K., Okuno, H.G., Kitano, H.: Auditory fovea based speech separation and its application to dialog system. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2002, pp. 1314–1319 (2002)
Valin, J.-M., Michaud, F., Rouat, J., Létourneau, D.: Robust sound source localization using a microphone array on a mobile robot. In: IEEE/RSJ-Int. Conf. on Intelligent Robots & Systems (October 2003)
Valin, J.M., Rouat, J., Michaud, F.: Microphone array post-filter for separation of simultaneous non-stationary sources. In: IEEE Int. Conf. on Acoustics Speech Signal Processing, May 17–21 (2004)
Potamitis, I., Tremoulis, G., Fakotakis, N.: Multi-speaker DOA tracking using interactive multiple models and probabilistic data association. In: EUROSPEECH, September 2003, pp. 517–520 (2003)
Schreiner, C.E., Urbas, J.V.: Representation of amplitude modulation in the auditory cortex of the cat. I. the anterior auditory filed (AAF). Hearing research 21, 227–241 (1986)
Schreiner, C.E., Langner, G.: Periodicity coding in the inferior colliculus of the cat. II, topographical organization. Journal of Neurophysiology 60, 1823–1840 (1988)
Robles, L., Ruggero, M.A., Rich, N.C.: Two-tone distortion in the basilar membrane of the cochlea. Nature 349, 413 (1991)
Evans, E.F.: Auditory processing of complex sounds: An overview. In: Phil. Trans. Royal Society of London, pp. 1–12. Oxford Press, Oxford (1992)
Ruggero, M.A., Robles, L., Rich, N.C., Recio, A.: Basilar membrane responses to two-tone and broadband stimuli. In: Phil. Trans. Royal Society of London, pp. 13–21. Oxford Press, Oxford (1992)
Arnott, R.H., Wallace, M.N., Shackleton, T.M., Palmer, A.R.: Onset neurones in the anteroventral cochlear nucleus project to the dorsal cochlear nucleus. JARO 5(2), 153–170 (2004)
Pressnitzer, D., Meddis, R., Delahaye, R., Winter, I.M.: Physiological correlates of comodulation masking release in the mammalian VCN. J. Neuroscience 21, 6377–6386 (2001)
Giguere, C., Woodland, P.C.: A computational model of the auditory periphery for speech and hearing research. JASA, 331–349 (1994)
Liberman, M.C., Puria, S., Guinan Jr., J.J.: The ipsilaterally evoked olivocochlearreflex causes rapid adaptation of the 2f1-f2 distortion product otoacoustic emission. JASA 99, 2572–3584 (1996)
Kim, S., Robert Frisina, D., Frisina, R.D.: Effects of Age on Contralateral suppression of Distorsion Product Otoacoustic Emissions in Human Listeners with Normal Hearing. Audiology Neuro Otology 7, 348–357 (2002)
Henkel, C.K.: The Auditory System. In: Haines, D.E. (ed.) Fondamental Neuroscience, Churchill Livingstone (1997)
Tang, P., Rouat, J.: Modeling neurons in the anteroventral cochlear nucleus for amplitude modulation (AM) processing: Application to speech sound. In: Proc. Int. Conf. on Spok. Lang. Proc., October 1996, pp. Th.P.2S2.2 (1996)
Harding, S., Meyer, G.: Multi-resolution auditory scene analysis: Robust speech recognition using pattern-matching from a noisy signal. In: EUROSPEECH, September 2003, pp. 2109–2112 (2003)
Bregman, A.: Auditory Scene Analysis. MIT Press, Cambridge (1994)
Beauvois, M.W., Meddis, R.: A computer model of auditory stream segregation. The Quaterly Journal of Experimental Psychology, 517–541 (1991)
Wang, D., Brown, G.J.: Separation of speech from interfering sounds based on oscillatory correlation. IEEE Tr. on Neural Networks 10(3), 684–697 (1999)
Cooke, M., Ellis, D.: The auditory organization of speech and other sources in listeners and computational models. Speech Communication, 141–177 (2001)
Handbook of Neural Computation. IOP Publishing Ltd and Oxford University Press (1997)
Maass, W., Bishop, C.M.: Pulsed Neural Networks. MIT Press, Cambridge (1998)
Wermter, S., Austin, J., Willshaw, D.: Emergent Neural Computational Architectures Based on Neuroscience. In: Towards Neuroscience-Inspired Computing, Springer, Heidelberg (2001)
Gerstner, W.: Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge (2002)
DeWeese, M.R., Zador, A.M.: Binary coding in auditory cortex. In: NIPS (December 2002)
Thorpe, S., Fize, D., Marlot, C.: Speed of processing in the human visual system. Nature 381(6582), 520–522 (1996)
Thorpe, S., Delorme, A., Van Rullen, R.: Spike-based strategies for rapid processing. Neural Networks 14(6-7), 715–725 (2001)
Natschläger, T., Maass, W.: Information dynamics and emergent computation in recurrent circuits of spiking. In: NIPS (December 2003)
Delgutte, B.: Representation of speech-like sounds in the discharge patterns of auditory nerve fibers. JASA 68, 843–857 (1980)
Frisina, R.D., Smith, R.L., Chamberlain, S.C.: Differential encoding of rapid changes in sound amplitude by second-order auditory neurons. Experimental Brain Research 60, 417–422 (1985)
Popper, A.N., Fay, R. (eds.): The Mammalian Auditory Pathway: Neurophysiology. Springer, Heidelberg (1992)
Hewitt, M., Meddis, R.: A computer model of amplitude-modulation sensitivity of single units in the inferior colliculus. Journal of the Acoustical Society of America 95(4), 2145–2159 (1994)
Zotkin, D.N., Shamma, S.A., Ru, P., Duraiswami, R., Davis, L.S.: Pitch and timbre manipulations using cortical representation of sound. In: ICASSP, vol. V, pp. 517–520 (2003)
Maass, W.: Networks of spiking neurons: The third generation of neural network models. Neural Networks 10(9), 1659–1671 (1997)
Maass, W., Sontag, E.D.: Neural systems as nonlinear filters. Neural Computation 12(8), 1743–1772 (2000)
Natschläger, T., Maass, W., Zador, A.: Efficient temporal processing with biologically realistic dynamic synapses. Network: Computation in Neural Systems 12(1), 75–87 (2001)
Rieke, F., Warland, D., de Ruyter van Steveninck, R., Bialek, W.: SPIKES Exploring the Neural Code. MIT Press, Cambridge (1997)
Sejnowski, T.J.: Time for a new neural code? Nature 376, 21–22 (1995)
DeWeese, M.: Optimization principles for the neural code. Network: Computation in Neural Systems 7(2), 325–331 (1996)
Chechik, G., Tishby, N.: Temporally dependent plasticity: An information theoretic account. In: NIPS (2000)
Ho, T.V., Rouat, J.: Novelty detection based on relaxation time of a network of integrate–and–fire neurons. In: Proc. of the IEEE,INNS Int. Joint Conf. on Neural Networks, May 1998, vol. 2, pp. 1524–1529 (1998)
Borisyuk, R., Denham, M., Hoppensteadt, F., Kazanovich, Y., Vinogradova, O.: Oscillatory model of novelty detection. Network: Computation in Neural Systems 12(1), 1–20 (2001)
Panchev, C., Wermter, S.: Spiking-time-dependent synaptic plasticity: From single spikes to spike trains. In: Computational Neuroscience Meeting, July 2003, pp. 494–506. Springer, Heidelberg (2003)
Perrinet, L.: Comment déchifrer le code impulsionnel de la Vision? Étude du flux parallèle, asynchrone et épars dans le traitement visuel ultra-rapide. PhD thesis, Université Paul Sabatier (2003)
Milner, P.M.: A model for visual shape recognition. Psychological Review 81, 521–535 (1974)
Malsburg, C.v.d.: The correlation theory of brain function. Technical Report Internal Report 81-2, Max-Planck Institute for Biophysical Chemistry (1981)
Malsburg, C.v.d., Schneider, W.: A neural cocktail-party processor. Biol. Cybern., 29–40 (1986)
Malsburg, C.v.d.: The what and why of binding: The modeler’s perspective. Neuron, 95–104 (1999)
Bohte, S.M., Poutré, H.L., Kok, J.N.: Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer RBF networks. IEEE Tr. on neural networks 13(2), 426–435 (2002)
Schwartz, J.L., Escudier, P.: Auditory processing in a post-cochlear neural network: Vowel spectrum processing based on spike synchrony. In: EUROSPEECH, pp. 247–253 (1989)
Hopfield, J.: Pattern recognition computation using action potential timing for stimulus representation. Nature 376, 33–36 (1995)
Levy, N., Horn, D., Meilijson, I., Ruppin, E.: Distributed synchrony in a cell assembly of spiking neurons. Neural Networks 14(6-7), 815–824 (2001)
Wang, D.L., Terman, D.: Image segmentation based on oscillatory correlation. Neural Computation 9, 805–836 (1997)
Alkon, D.L., Blackwell, K.T.: Pattern recognition by an artificial network derived from biological neuronal systems. Biological Cybernetics 62, 363–376 (1990)
Blackwell, K.T., Vogl, T.P., Hyman, S.D., Barbour, G.S., Alkon, D.L.: A new approach to hand-written character recognition. Pattern Recognition Journal 25(6), 655–666 (1992); Implémenté par M. Garcia.
Rouat, J., Garcia, M.: A prototype speech recogniser based on associative learning and nonlinear speech analysis. In: Rosenthal, Okuno (eds.) Computational Auditory Scene Analysis, pp. 13–26. L. Erlbaum, Mahwah (1998)
Van Rullen, R., Thorpe, S.J.: Surfing a spike wave down the ventral stream. Vision Research 42(23), 2593–2615 (2002)
Sameti, H., Sheikhzadeh, H., Deng, L., Brennan, R.L.: HMM based strategies for enhancement of speech signals embedded in nonstationary noise. IEEE Trans. on Speech and Audio Processing, 445–455 (1998)
Roweis, S.T.: One microphone source seperation. In: NIPS, Denver, USA (2000)
Roweis, S.T.: Factorial models and refiltering for speech separation and denoising. In: Eurospeech (2003)
Reyes-Gomez, M.J., Raj, B., Ellis, D.: Multi-channel source separation by factorial HMMs. In: ICASSP (2003)
Jang, G., Lee, T.: A maximum likelihood approach to single-channel source separation. IEEE-SPL, 168–171 (2003)
Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. On Neural Networks, 1135–1150 (2004)
Haines, D.E. (ed.): Fondamental Neuroscience. Churchill Livingstone (1997)
Atlas, L., Shamma, S.A.: Joint acoustic and modulation frequency. EURASIP J. on Appl. Sig. Proc. (7), 668–675 (2003)
Meyer, G., Yang, D., Ainsworth, W.: Applying a model of concurrent vowel segregation to real speech. In: Greenberg, S., Slaney, M. (eds.) Computational models of auditory function, pp. 297–310 (2001)
Rouat, J.: Spatio-temporal pattern recognition with neural networks: Application to speech. In: Gerstner, W., Hasler, M., Germond, A., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 43–48. Springer, Heidelberg (1997)
Rouat, J., Liu, Y.C., Morissette, D.: A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Comm. 21, 191–207 (1997)
Plante, F., Meyer, G., Ainsworth, W.: Improvement of speech spectrogram accuracy by the method of reassignment. IEEE Trans. on Speech and Audio Processing, 282–287 (1998)
Gabbiani, F., Krapp, H., Koch, C., Laurent, G.: Multiplicative computation in a visual neuron sensitive to looming. Nature 420, 320–324 (2002)
Pena, J.L., Konishi, M.: Auditory spatial receptive fields created by multiplication. Science 292, 294–252 (2001)
Andersen, R.A., Snyder, L.H., Bradley, D.C., Xing, J.: Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Ann. Rev. Neurosci. 20, 303 (1997)
Pichevar, R. (2004), http://www-edu.gel.usherbrooke.ca/pichevar/
Rouat, J. (2004), http://www.gel.usherb.ca/rouat
Hunt, M.J., Lefebvre, C.: Speech recognition using an auditory model with pitch-synchronous analysis. In: ICASSP, New York, pp. 813–816. IEEE, Los Alamitos (1987)
Hunt, M.J., Lefebvre, C.: Speaker dependent and independent speech recognition experiments with an auditory model. In: IEEE ICASSP, New York, pp. 215–218. IEEE, Los Alamitos (1988)
Seneff, S.: A joint synchrony/mean-rate model of auditory speech processing. Journal of Phonetics 16(1), 55–76 (1988)
McEachern, R.: How the ear really works. In: IEEE Int. Symp. Time-frequency and Time-Scale Analysis, vol. 10 (1992)
McEachern, R.: Hearing it like it is: Audio signal processing the way the ear does it. DSP Applications, 35–47, 02 (1994)
Van Immerseel, L.: Een Functioneel Gehoormodel Voor de Analyse Van Spraak Bij Spraakherkenning. PhD thesis, 05 (1993)
Ghitza, O.: Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE TrSAP 2(1), 115–132 (1994)
Sandhu, S., Ghitza, O.: A comparative study of mel cepstra and EIH for phone classification under adverse conditions. In: ICASSP, pp. 409–412 (1995)
Slaney, M., Lyon, R.F.: A perceptual pitch detector. In: ICASSP, pp. 357–360, 03 (1990)
Slaney, M., Lyon, R.F.: On the importance of time - a temporal representation of sound. In: Visual Representations of Speech Signals, pp. 95–116. John Wiley & Sons, Chichester (1993)
Ainsworth, W., Meyer, G.: Speech analysis by means of a physiologically-based model of the cochlear nerve and cochlear nucleus. In: ESCA ETRW Visual Representations of Speech Signals, Sheffield, pp. 119–124. John Wiley & Sons, Chichester (1993)
Patterson, R.D.: The sound of a sinusoid I: Spectral models. JASA 96(3), 1409–1418 (1994)
Patterson, R.D.: The sound of a sinusoid II: Spectral models. JASA 96(3), 1419–1428 (1994)
Hunt, M.J., Lefèbvre, C.: A comparison of several acoustic representations for speech recognition with degraded and undegraded speech. In: ICASSP, May 1989, pp. 262–265 (1989)
Hunt, M.J.: Spectral signal processing for asr. In: Proc. IEEE International Workshop on Automatic Speech Recognition and Understanding (ASRU), December 12-15 (1999)
Loiselle, S.: Exploration de réseaux de neurones à décharges dans un contexte de reconnaissance de parole. Master’s thesis, Université du Québec à Chicoutimi (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rouat, J., Pichevar, R., Loiselle, S. (2005). Perceptive, Non-linear Speech Processing and Spiking Neural Networks. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_14
Download citation
DOI: https://doi.org/10.1007/11520153_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)