Skip to main content

Perceptive, Non-linear Speech Processing and Spiking Neural Networks

  • Conference paper
Nonlinear Speech Modeling and Applications (NN 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Included in the following conference series:

Abstract

Source separation and speech recognition are very difficult in the context of noisy and corrupted speech. Most conventional techniques need huge databases to estimate speech (or noise) density probabilities to perform separation or recognition. We discuss the potential of perceptive speech analysis and processing in combination with biologically plausible neural network processors. We illustrate the potential of such non-linear processing of speech on a source separation system inspired by an Auditory Scene Analysis paradigm. We also discuss a potential application in speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Widrow, B., et al.: Adaptive noise cancelling: Principles and applications. Proceedings of the IEEE 63(12) (1975)

    Google Scholar 

  2. Kaneda, Y., Ohga, J.: Adaptive microphone-array system for noise reduction. IEEE Tr. on ASSP 34(6), 1391–1400 (1986)

    Article  Google Scholar 

  3. Hyvärinen, Karhunen, Oja: Independent Component Analysis. Wiley, Chichester (2001)

    Book  Google Scholar 

  4. Van Compernolle, D., Ma, W., Xie, F., Van Diest, M.: Speech recognition in noisy environments with the aid of microphone array. Speech Communication 9(5-6), 433–442 (1990)

    Article  Google Scholar 

  5. Seltzer, M.L., Raj, B., Stern, R.M.: Speech recognizer-based microphone array processing for robust hands-free speech recognition. In: ICASSP, vol. I, pp. 897–900 (2002)

    Google Scholar 

  6. Brandstein, M.S., Ward, D.B. (eds.): Microphone Arrays: Signal Processing Techniques and Applications. Springer, Heidelberg (2001)

    Google Scholar 

  7. Haykin, S.: Adaptive Filter Theory. Prentice-Hall, Englewood Cliffs (2002)

    Google Scholar 

  8. Sánchez-Bote, J.L., González-Rodríguez, J., Ortega-Garcían, J.: A real-time auditory-based microphone array assessed with E-RASTI evaluation proposal. In: ICASSP, vol. V, pp. 447–450 (2003)

    Google Scholar 

  9. Nakadai, K., Okuno, H.G., Kitano, H.: Auditory fovea based speech separation and its application to dialog system. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2002, pp. 1314–1319 (2002)

    Google Scholar 

  10. Valin, J.-M., Michaud, F., Rouat, J., Létourneau, D.: Robust sound source localization using a microphone array on a mobile robot. In: IEEE/RSJ-Int. Conf. on Intelligent Robots & Systems (October 2003)

    Google Scholar 

  11. Valin, J.M., Rouat, J., Michaud, F.: Microphone array post-filter for separation of simultaneous non-stationary sources. In: IEEE Int. Conf. on Acoustics Speech Signal Processing, May 17–21 (2004)

    Google Scholar 

  12. Potamitis, I., Tremoulis, G., Fakotakis, N.: Multi-speaker DOA tracking using interactive multiple models and probabilistic data association. In: EUROSPEECH, September 2003, pp. 517–520 (2003)

    Google Scholar 

  13. Schreiner, C.E., Urbas, J.V.: Representation of amplitude modulation in the auditory cortex of the cat. I. the anterior auditory filed (AAF). Hearing research 21, 227–241 (1986)

    Article  Google Scholar 

  14. Schreiner, C.E., Langner, G.: Periodicity coding in the inferior colliculus of the cat. II, topographical organization. Journal of Neurophysiology 60, 1823–1840 (1988)

    Google Scholar 

  15. Robles, L., Ruggero, M.A., Rich, N.C.: Two-tone distortion in the basilar membrane of the cochlea. Nature 349, 413 (1991)

    Article  Google Scholar 

  16. Evans, E.F.: Auditory processing of complex sounds: An overview. In: Phil. Trans. Royal Society of London, pp. 1–12. Oxford Press, Oxford (1992)

    Google Scholar 

  17. Ruggero, M.A., Robles, L., Rich, N.C., Recio, A.: Basilar membrane responses to two-tone and broadband stimuli. In: Phil. Trans. Royal Society of London, pp. 13–21. Oxford Press, Oxford (1992)

    Google Scholar 

  18. Arnott, R.H., Wallace, M.N., Shackleton, T.M., Palmer, A.R.: Onset neurones in the anteroventral cochlear nucleus project to the dorsal cochlear nucleus. JARO 5(2), 153–170 (2004)

    Article  Google Scholar 

  19. Pressnitzer, D., Meddis, R., Delahaye, R., Winter, I.M.: Physiological correlates of comodulation masking release in the mammalian VCN. J. Neuroscience 21, 6377–6386 (2001)

    Google Scholar 

  20. Giguere, C., Woodland, P.C.: A computational model of the auditory periphery for speech and hearing research. JASA, 331–349 (1994)

    Google Scholar 

  21. Liberman, M.C., Puria, S., Guinan Jr., J.J.: The ipsilaterally evoked olivocochlearreflex causes rapid adaptation of the 2f1-f2 distortion product otoacoustic emission. JASA 99, 2572–3584 (1996)

    Google Scholar 

  22. Kim, S., Robert Frisina, D., Frisina, R.D.: Effects of Age on Contralateral suppression of Distorsion Product Otoacoustic Emissions in Human Listeners with Normal Hearing. Audiology Neuro Otology 7, 348–357 (2002)

    Article  Google Scholar 

  23. Henkel, C.K.: The Auditory System. In: Haines, D.E. (ed.) Fondamental Neuroscience, Churchill Livingstone (1997)

    Google Scholar 

  24. Tang, P., Rouat, J.: Modeling neurons in the anteroventral cochlear nucleus for amplitude modulation (AM) processing: Application to speech sound. In: Proc. Int. Conf. on Spok. Lang. Proc., October 1996, pp. Th.P.2S2.2 (1996)

    Google Scholar 

  25. Harding, S., Meyer, G.: Multi-resolution auditory scene analysis: Robust speech recognition using pattern-matching from a noisy signal. In: EUROSPEECH, September 2003, pp. 2109–2112 (2003)

    Google Scholar 

  26. Bregman, A.: Auditory Scene Analysis. MIT Press, Cambridge (1994)

    Google Scholar 

  27. Beauvois, M.W., Meddis, R.: A computer model of auditory stream segregation. The Quaterly Journal of Experimental Psychology, 517–541 (1991)

    Google Scholar 

  28. Wang, D., Brown, G.J.: Separation of speech from interfering sounds based on oscillatory correlation. IEEE Tr. on Neural Networks 10(3), 684–697 (1999)

    Article  MathSciNet  Google Scholar 

  29. Cooke, M., Ellis, D.: The auditory organization of speech and other sources in listeners and computational models. Speech Communication, 141–177 (2001)

    Google Scholar 

  30. Handbook of Neural Computation. IOP Publishing Ltd and Oxford University Press (1997)

    Google Scholar 

  31. Maass, W., Bishop, C.M.: Pulsed Neural Networks. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  32. Wermter, S., Austin, J., Willshaw, D.: Emergent Neural Computational Architectures Based on Neuroscience. In: Towards Neuroscience-Inspired Computing, Springer, Heidelberg (2001)

    Google Scholar 

  33. Gerstner, W.: Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  34. DeWeese, M.R., Zador, A.M.: Binary coding in auditory cortex. In: NIPS (December 2002)

    Google Scholar 

  35. Thorpe, S., Fize, D., Marlot, C.: Speed of processing in the human visual system. Nature 381(6582), 520–522 (1996)

    Article  Google Scholar 

  36. Thorpe, S., Delorme, A., Van Rullen, R.: Spike-based strategies for rapid processing. Neural Networks 14(6-7), 715–725 (2001)

    Article  Google Scholar 

  37. Natschläger, T., Maass, W.: Information dynamics and emergent computation in recurrent circuits of spiking. In: NIPS (December 2003)

    Google Scholar 

  38. Delgutte, B.: Representation of speech-like sounds in the discharge patterns of auditory nerve fibers. JASA 68, 843–857 (1980)

    Google Scholar 

  39. Frisina, R.D., Smith, R.L., Chamberlain, S.C.: Differential encoding of rapid changes in sound amplitude by second-order auditory neurons. Experimental Brain Research 60, 417–422 (1985)

    Article  Google Scholar 

  40. Popper, A.N., Fay, R. (eds.): The Mammalian Auditory Pathway: Neurophysiology. Springer, Heidelberg (1992)

    Google Scholar 

  41. Hewitt, M., Meddis, R.: A computer model of amplitude-modulation sensitivity of single units in the inferior colliculus. Journal of the Acoustical Society of America 95(4), 2145–2159 (1994)

    Article  Google Scholar 

  42. Zotkin, D.N., Shamma, S.A., Ru, P., Duraiswami, R., Davis, L.S.: Pitch and timbre manipulations using cortical representation of sound. In: ICASSP, vol. V, pp. 517–520 (2003)

    Google Scholar 

  43. Maass, W.: Networks of spiking neurons: The third generation of neural network models. Neural Networks 10(9), 1659–1671 (1997)

    Article  Google Scholar 

  44. Maass, W., Sontag, E.D.: Neural systems as nonlinear filters. Neural Computation 12(8), 1743–1772 (2000)

    Article  Google Scholar 

  45. Natschläger, T., Maass, W., Zador, A.: Efficient temporal processing with biologically realistic dynamic synapses. Network: Computation in Neural Systems 12(1), 75–87 (2001)

    MATH  Google Scholar 

  46. Rieke, F., Warland, D., de Ruyter van Steveninck, R., Bialek, W.: SPIKES Exploring the Neural Code. MIT Press, Cambridge (1997)

    Google Scholar 

  47. Sejnowski, T.J.: Time for a new neural code? Nature 376, 21–22 (1995)

    Article  Google Scholar 

  48. DeWeese, M.: Optimization principles for the neural code. Network: Computation in Neural Systems 7(2), 325–331 (1996)

    Article  MATH  Google Scholar 

  49. Chechik, G., Tishby, N.: Temporally dependent plasticity: An information theoretic account. In: NIPS (2000)

    Google Scholar 

  50. Ho, T.V., Rouat, J.: Novelty detection based on relaxation time of a network of integrate–and–fire neurons. In: Proc. of the IEEE,INNS Int. Joint Conf. on Neural Networks, May 1998, vol. 2, pp. 1524–1529 (1998)

    Google Scholar 

  51. Borisyuk, R., Denham, M., Hoppensteadt, F., Kazanovich, Y., Vinogradova, O.: Oscillatory model of novelty detection. Network: Computation in Neural Systems 12(1), 1–20 (2001)

    Google Scholar 

  52. Panchev, C., Wermter, S.: Spiking-time-dependent synaptic plasticity: From single spikes to spike trains. In: Computational Neuroscience Meeting, July 2003, pp. 494–506. Springer, Heidelberg (2003)

    Google Scholar 

  53. Perrinet, L.: Comment déchifrer le code impulsionnel de la Vision? Étude du flux parallèle, asynchrone et épars dans le traitement visuel ultra-rapide. PhD thesis, Université Paul Sabatier (2003)

    Google Scholar 

  54. Milner, P.M.: A model for visual shape recognition. Psychological Review 81, 521–535 (1974)

    Article  Google Scholar 

  55. Malsburg, C.v.d.: The correlation theory of brain function. Technical Report Internal Report 81-2, Max-Planck Institute for Biophysical Chemistry (1981)

    Google Scholar 

  56. Malsburg, C.v.d., Schneider, W.: A neural cocktail-party processor. Biol. Cybern., 29–40 (1986)

    Google Scholar 

  57. Malsburg, C.v.d.: The what and why of binding: The modeler’s perspective. Neuron, 95–104 (1999)

    Google Scholar 

  58. Bohte, S.M., Poutré, H.L., Kok, J.N.: Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer RBF networks. IEEE Tr. on neural networks 13(2), 426–435 (2002)

    Article  Google Scholar 

  59. Schwartz, J.L., Escudier, P.: Auditory processing in a post-cochlear neural network: Vowel spectrum processing based on spike synchrony. In: EUROSPEECH, pp. 247–253 (1989)

    Google Scholar 

  60. Hopfield, J.: Pattern recognition computation using action potential timing for stimulus representation. Nature 376, 33–36 (1995)

    Article  Google Scholar 

  61. Levy, N., Horn, D., Meilijson, I., Ruppin, E.: Distributed synchrony in a cell assembly of spiking neurons. Neural Networks 14(6-7), 815–824 (2001)

    Article  Google Scholar 

  62. Wang, D.L., Terman, D.: Image segmentation based on oscillatory correlation. Neural Computation 9, 805–836 (1997)

    Article  Google Scholar 

  63. Alkon, D.L., Blackwell, K.T.: Pattern recognition by an artificial network derived from biological neuronal systems. Biological Cybernetics 62, 363–376 (1990)

    Article  MATH  Google Scholar 

  64. Blackwell, K.T., Vogl, T.P., Hyman, S.D., Barbour, G.S., Alkon, D.L.: A new approach to hand-written character recognition. Pattern Recognition Journal 25(6), 655–666 (1992); Implémenté par M. Garcia.

    Article  Google Scholar 

  65. Rouat, J., Garcia, M.: A prototype speech recogniser based on associative learning and nonlinear speech analysis. In: Rosenthal, Okuno (eds.) Computational Auditory Scene Analysis, pp. 13–26. L. Erlbaum, Mahwah (1998)

    Google Scholar 

  66. Van Rullen, R., Thorpe, S.J.: Surfing a spike wave down the ventral stream. Vision Research 42(23), 2593–2615 (2002)

    Article  Google Scholar 

  67. Sameti, H., Sheikhzadeh, H., Deng, L., Brennan, R.L.: HMM based strategies for enhancement of speech signals embedded in nonstationary noise. IEEE Trans. on Speech and Audio Processing, 445–455 (1998)

    Google Scholar 

  68. Roweis, S.T.: One microphone source seperation. In: NIPS, Denver, USA (2000)

    Google Scholar 

  69. Roweis, S.T.: Factorial models and refiltering for speech separation and denoising. In: Eurospeech (2003)

    Google Scholar 

  70. Reyes-Gomez, M.J., Raj, B., Ellis, D.: Multi-channel source separation by factorial HMMs. In: ICASSP (2003)

    Google Scholar 

  71. Jang, G., Lee, T.: A maximum likelihood approach to single-channel source separation. IEEE-SPL, 168–171 (2003)

    Google Scholar 

  72. Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. On Neural Networks, 1135–1150 (2004)

    Google Scholar 

  73. Haines, D.E. (ed.): Fondamental Neuroscience. Churchill Livingstone (1997)

    Google Scholar 

  74. Atlas, L., Shamma, S.A.: Joint acoustic and modulation frequency. EURASIP J. on Appl. Sig. Proc. (7), 668–675 (2003)

    Google Scholar 

  75. Meyer, G., Yang, D., Ainsworth, W.: Applying a model of concurrent vowel segregation to real speech. In: Greenberg, S., Slaney, M. (eds.) Computational models of auditory function, pp. 297–310 (2001)

    Google Scholar 

  76. Rouat, J.: Spatio-temporal pattern recognition with neural networks: Application to speech. In: Gerstner, W., Hasler, M., Germond, A., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 43–48. Springer, Heidelberg (1997)

    Google Scholar 

  77. Rouat, J., Liu, Y.C., Morissette, D.: A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Comm. 21, 191–207 (1997)

    Article  Google Scholar 

  78. Plante, F., Meyer, G., Ainsworth, W.: Improvement of speech spectrogram accuracy by the method of reassignment. IEEE Trans. on Speech and Audio Processing, 282–287 (1998)

    Google Scholar 

  79. Gabbiani, F., Krapp, H., Koch, C., Laurent, G.: Multiplicative computation in a visual neuron sensitive to looming. Nature 420, 320–324 (2002)

    Article  Google Scholar 

  80. Pena, J.L., Konishi, M.: Auditory spatial receptive fields created by multiplication. Science 292, 294–252 (2001)

    Google Scholar 

  81. Andersen, R.A., Snyder, L.H., Bradley, D.C., Xing, J.: Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Ann. Rev. Neurosci. 20, 303 (1997)

    Article  Google Scholar 

  82. Pichevar, R. (2004), http://www-edu.gel.usherbrooke.ca/pichevar/

  83. Rouat, J. (2004), http://www.gel.usherb.ca/rouat

  84. Hunt, M.J., Lefebvre, C.: Speech recognition using an auditory model with pitch-synchronous analysis. In: ICASSP, New York, pp. 813–816. IEEE, Los Alamitos (1987)

    Google Scholar 

  85. Hunt, M.J., Lefebvre, C.: Speaker dependent and independent speech recognition experiments with an auditory model. In: IEEE ICASSP, New York, pp. 215–218. IEEE, Los Alamitos (1988)

    Google Scholar 

  86. Seneff, S.: A joint synchrony/mean-rate model of auditory speech processing. Journal of Phonetics 16(1), 55–76 (1988)

    Google Scholar 

  87. McEachern, R.: How the ear really works. In: IEEE Int. Symp. Time-frequency and Time-Scale Analysis, vol. 10 (1992)

    Google Scholar 

  88. McEachern, R.: Hearing it like it is: Audio signal processing the way the ear does it. DSP Applications, 35–47, 02 (1994)

    Google Scholar 

  89. Van Immerseel, L.: Een Functioneel Gehoormodel Voor de Analyse Van Spraak Bij Spraakherkenning. PhD thesis, 05 (1993)

    Google Scholar 

  90. Ghitza, O.: Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE TrSAP 2(1), 115–132 (1994)

    Google Scholar 

  91. Sandhu, S., Ghitza, O.: A comparative study of mel cepstra and EIH for phone classification under adverse conditions. In: ICASSP, pp. 409–412 (1995)

    Google Scholar 

  92. Slaney, M., Lyon, R.F.: A perceptual pitch detector. In: ICASSP, pp. 357–360, 03 (1990)

    Google Scholar 

  93. Slaney, M., Lyon, R.F.: On the importance of time - a temporal representation of sound. In: Visual Representations of Speech Signals, pp. 95–116. John Wiley & Sons, Chichester (1993)

    Google Scholar 

  94. Ainsworth, W., Meyer, G.: Speech analysis by means of a physiologically-based model of the cochlear nerve and cochlear nucleus. In: ESCA ETRW Visual Representations of Speech Signals, Sheffield, pp. 119–124. John Wiley & Sons, Chichester (1993)

    Google Scholar 

  95. Patterson, R.D.: The sound of a sinusoid I: Spectral models. JASA 96(3), 1409–1418 (1994)

    Google Scholar 

  96. Patterson, R.D.: The sound of a sinusoid II: Spectral models. JASA 96(3), 1419–1428 (1994)

    Google Scholar 

  97. Hunt, M.J., Lefèbvre, C.: A comparison of several acoustic representations for speech recognition with degraded and undegraded speech. In: ICASSP, May 1989, pp. 262–265 (1989)

    Google Scholar 

  98. Hunt, M.J.: Spectral signal processing for asr. In: Proc. IEEE International Workshop on Automatic Speech Recognition and Understanding (ASRU), December 12-15 (1999)

    Google Scholar 

  99. Loiselle, S.: Exploration de réseaux de neurones à décharges dans un contexte de reconnaissance de parole. Master’s thesis, Université du Québec à Chicoutimi (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rouat, J., Pichevar, R., Loiselle, S. (2005). Perceptive, Non-linear Speech Processing and Spiking Neural Networks. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_14

Download citation

  • DOI: https://doi.org/10.1007/11520153_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27441-4

  • Online ISBN: 978-3-540-31886-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics