Skip to main content

Generalized Mutual Interdependence Analysis of Noisy Channels

  • Chapter
  • First Online:
Excursions in Harmonic Analysis, Volume 1

Abstract

The main motivation for our present work is to reliably perform voice (or signal) detection for a source of interest from a single microphone recording. We rely on the assumption that the input signal contains invariant information about the channel, or transfer function from each source to the microphone, which could be reliably exploited for signal detection and classification. In this chapter we employ a nonconventional method called generalized mutual interdependence analysis (GMIA) that proposes a model for the computation of this hidden invariant information present across multiple measurements. Such information turns out to be a good characteristic feature of a signal source, transformation, or composition that fits the model. This chapter introduces a unitary and succinct description of the underlying model of GMIA, and the formulation and solution of the corresponding optimization problem. We apply GMIA for feature extraction in the problem of own-voice activity detection, which aims at classification of a near-field channel based on access to prior information about GMIA features of the channel. It is extremely challenging to recognize the presence of voice in noisy scenarios with interference from music, car noise, or street noise. We compare GMIA with MFCC and cepstral-mean features. For example, GMIA performs with equal error rates below 10 % for music interference of SNRs down to − 20 dB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The instance i implicitly represents the timescale of interest, e.g., a timescale of the order of the pitch period (10–20 ms) or of the order of the average word period (500 ms).

  2. 2.

    The spectrum of the excitation changes slowly for voiced sounds and appears unchanged although radically different over the duration of a consonant, at the phonetic timescale.

  3. 3.

    A detailed analysis of these components of the speech production model is beyond present scope.

References

  1. Benesty, J., Sondhi, M.M., Huang, Y.: Handbook of Speech Processing. Springer, Berlin (2008)

    Google Scholar 

  2. Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process. 4, 430–451 (2004)

    Google Scholar 

  3. Cho, Y., Al-Naimi, K., Kondoz, A.: Improved voice activity detection based on a smoothed statistical likelihood ratio. In: International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 737–740. IEEE, Los Alamitos, CA (2001)

    Google Scholar 

  4. Claussen, H., Rosca, J., Damper, R.: Mutual interdependence analysis. In: Independent Component Analysis and Blind Signal Separation, pp. 446–453. Springer, Heidelberg (2007)

    Google Scholar 

  5. Claussen, H., Rosca, J., Damper, R.: Mutual features for robust identification and verification. In: International Conference on Acoustics, Speech and Signal Processing, pp. 1849–1852. Las Vegas, NV (2008)

    Google Scholar 

  6. Claussen, H., Rosca, J., Damper, R.: Generalized mutual interdependence analysis. In: International Conference on Acoustics, Speech and Signal Processing, pp. 3317–3320. Taipei, Taiwan (2009)

    Google Scholar 

  7. Claussen, H., Rosca, J., Damper, R.I.: Signature extraction using mutual interdependencies. Pattern Recognit. 44, 650–661 (2011)

    Google Scholar 

  8. Deng, L., O’Shaughnessy, D.: Speech Processing: A Dynamic and Optimization-Oriented Approach. Signal Process. Commun. Dekker, New York (2003)

    Google Scholar 

  9. ETSI: Digital cellular telecommunication system (phase 2+); voice activity detector VAD for adaptive multi rate (AMR) speech traffic channels; general description. Technical Report V.7.0.0, ETSI (1999)

    Google Scholar 

  10. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)

    Google Scholar 

  11. Fisher, W.M., Doddington, G.R., Goudie-Marshall, K.M., Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J.: NTIMIT. Linguistic Data Consortium, Philadelphia CDROM (1993). http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S2

  12. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia CDROM (1993). http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1

  13. Haigh, J., Mason, J.: Robust voice activity detection using cepstral features. In: IEEE Region 10 Conference TENCON, vol. 3, pp. 321–324. IEEE (1993)

    Google Scholar 

  14. Hotelling, H.: Relation between two sets of variates. Biometrika 28, 322–377 (1936)

    Google Scholar 

  15. Hoyt, J.D., Wechsler, H.: Detection of human speech in structured noise. In: International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 237–240. IEEE (1994)

    Google Scholar 

  16. Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A guide to Theory, Algorithm, and System Development. Prentice Hall, New York (2001)

    Google Scholar 

  17. Liu, P., Wang, Z.: Voice activity detection using visual information. In: International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 609–612. Montreal, Canada (2004)

    Google Scholar 

  18. Qu, T., Xiao, Z., Gong, M., Huang, Y., Li, X., Wu, X.: Distance-dependent head-related transfer functions measured with high spatial resolution using a spark gap. IEEE Trans. Audio, Speech Lang. Process. 17(6), 1124–1132 (2009)

    Google Scholar 

  19. Reynolds, D.A., Campbell, W.M.: Text-independent speaker recognition. In: Benesty, J., Sondhi, M.,  Huang, Y. (eds.) Handbook of Speech Processing and Communication, pp. 763–781. Springer GMBH, New York (2007)

    Google Scholar 

  20. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech, Audio Process. 3(1), 72–83 (1995)

    Google Scholar 

  21. Rosca, J., Balan, R., Fan, N., Beaugeant, C., Gilg, V.: Multichannel voice detection in adverse environments. In: European Signal Processing Conference (2002)

    Google Scholar 

  22. Srinivasan, K., Gersho, A.: Voice activity detection for cellular networks. In: IEEE Speech Coding Workshop, pp. 85–86 (1993)

    Google Scholar 

  23. Tikhonov, A.: On the stability of inverse problems. Doklady Akademii Nauk SSSR 39(5), 195–198 (1943)

    Google Scholar 

  24. Zhang, Z., Liu, Z., Sinclair, M., Acero, A., Deng, L., Huang, X., Zheng, Y.: Multi-sensory microphones for robust speech detection, enhancement and recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 781–784. IEEE (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Justinian Rosca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Birkhäuser Boston

About this chapter

Cite this chapter

Claussen, H., Rosca, J., Ramasubramanian, V., Thiyagarajan, S. (2013). Generalized Mutual Interdependence Analysis of Noisy Channels. In: Andrews, T., Balan, R., Benedetto, J., Czaja, W., Okoudjou, K. (eds) Excursions in Harmonic Analysis, Volume 1. Applied and Numerical Harmonic Analysis. Birkhäuser, Boston. https://doi.org/10.1007/978-0-8176-8376-4_18

Download citation

Publish with us

Policies and ethics