Generalized Mutual Interdependence Analysis of Noisy Channels

Claussen, Heiko; Rosca, Justinian; Ramasubramanian, Viswanathan; Thiyagarajan, Subramani

doi:10.1007/978-0-8176-8376-4_18

Heiko Claussen⁶,
Justinian Rosca⁶,
Viswanathan Ramasubramanian⁷ &
…
Subramani Thiyagarajan⁷

Part of the book series: Applied and Numerical Harmonic Analysis ((ANHA))

1741 Accesses

Abstract

The main motivation for our present work is to reliably perform voice (or signal) detection for a source of interest from a single microphone recording. We rely on the assumption that the input signal contains invariant information about the channel, or transfer function from each source to the microphone, which could be reliably exploited for signal detection and classification. In this chapter we employ a nonconventional method called generalized mutual interdependence analysis (GMIA) that proposes a model for the computation of this hidden invariant information present across multiple measurements. Such information turns out to be a good characteristic feature of a signal source, transformation, or composition that fits the model. This chapter introduces a unitary and succinct description of the underlying model of GMIA, and the formulation and solution of the corresponding optimization problem. We apply GMIA for feature extraction in the problem of own-voice activity detection, which aims at classification of a near-field channel based on access to prior information about GMIA features of the channel. It is extremely challenging to recognize the presence of voice in noisy scenarios with interference from music, car noise, or street noise. We compare GMIA with MFCC and cepstral-mean features. For example, GMIA performs with equal error rates below 10 % for music interference of SNRs down to − 20 dB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The instance i implicitly represents the timescale of interest, e.g., a timescale of the order of the pitch period (10–20 ms) or of the order of the average word period (500 ms).
2.
The spectrum of the excitation changes slowly for voiced sounds and appears unchanged although radically different over the duration of a consonant, at the phonetic timescale.
3.
A detailed analysis of these components of the speech production model is beyond present scope.

References

Benesty, J., Sondhi, M.M., Huang, Y.: Handbook of Speech Processing. Springer, Berlin (2008)
Google Scholar
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process. 4, 430–451 (2004)
Google Scholar
Cho, Y., Al-Naimi, K., Kondoz, A.: Improved voice activity detection based on a smoothed statistical likelihood ratio. In: International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 737–740. IEEE, Los Alamitos, CA (2001)
Google Scholar
Claussen, H., Rosca, J., Damper, R.: Mutual interdependence analysis. In: Independent Component Analysis and Blind Signal Separation, pp. 446–453. Springer, Heidelberg (2007)
Google Scholar
Claussen, H., Rosca, J., Damper, R.: Mutual features for robust identification and verification. In: International Conference on Acoustics, Speech and Signal Processing, pp. 1849–1852. Las Vegas, NV (2008)
Google Scholar
Claussen, H., Rosca, J., Damper, R.: Generalized mutual interdependence analysis. In: International Conference on Acoustics, Speech and Signal Processing, pp. 3317–3320. Taipei, Taiwan (2009)
Google Scholar
Claussen, H., Rosca, J., Damper, R.I.: Signature extraction using mutual interdependencies. Pattern Recognit. 44, 650–661 (2011)
Google Scholar
Deng, L., O’Shaughnessy, D.: Speech Processing: A Dynamic and Optimization-Oriented Approach. Signal Process. Commun. Dekker, New York (2003)
Google Scholar
ETSI: Digital cellular telecommunication system (phase 2+); voice activity detector VAD for adaptive multi rate (AMR) speech traffic channels; general description. Technical Report V.7.0.0, ETSI (1999)
Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)
Google Scholar
Fisher, W.M., Doddington, G.R., Goudie-Marshall, K.M., Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J.: NTIMIT. Linguistic Data Consortium, Philadelphia CDROM (1993). http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S2
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia CDROM (1993). http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1
Haigh, J., Mason, J.: Robust voice activity detection using cepstral features. In: IEEE Region 10 Conference TENCON, vol. 3, pp. 321–324. IEEE (1993)
Google Scholar
Hotelling, H.: Relation between two sets of variates. Biometrika 28, 322–377 (1936)
Google Scholar
Hoyt, J.D., Wechsler, H.: Detection of human speech in structured noise. In: International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 237–240. IEEE (1994)
Google Scholar
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A guide to Theory, Algorithm, and System Development. Prentice Hall, New York (2001)
Google Scholar
Liu, P., Wang, Z.: Voice activity detection using visual information. In: International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 609–612. Montreal, Canada (2004)
Google Scholar
Qu, T., Xiao, Z., Gong, M., Huang, Y., Li, X., Wu, X.: Distance-dependent head-related transfer functions measured with high spatial resolution using a spark gap. IEEE Trans. Audio, Speech Lang. Process. 17(6), 1124–1132 (2009)
Google Scholar
Reynolds, D.A., Campbell, W.M.: Text-independent speaker recognition. In: Benesty, J., Sondhi, M., Huang, Y. (eds.) Handbook of Speech Processing and Communication, pp. 763–781. Springer GMBH, New York (2007)
Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech, Audio Process. 3(1), 72–83 (1995)
Google Scholar
Rosca, J., Balan, R., Fan, N., Beaugeant, C., Gilg, V.: Multichannel voice detection in adverse environments. In: European Signal Processing Conference (2002)
Google Scholar
Srinivasan, K., Gersho, A.: Voice activity detection for cellular networks. In: IEEE Speech Coding Workshop, pp. 85–86 (1993)
Google Scholar
Tikhonov, A.: On the stability of inverse problems. Doklady Akademii Nauk SSSR 39(5), 195–198 (1943)
Google Scholar
Zhang, Z., Liu, Z., Sinclair, M., Acero, A., Deng, L., Huang, X., Zheng, Y.: Multi-sensory microphones for robust speech detection, enhancement and recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 781–784. IEEE (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Siemens Corporation, Corporate Research, 755 College Road East, Princeton, NJ, 08540, USA
Heiko Claussen & Justinian Rosca
Siemens Corporate Research and Technologies-India, Bangalore, India
Viswanathan Ramasubramanian & Subramani Thiyagarajan

Authors

Heiko Claussen
View author publications
You can also search for this author in PubMed Google Scholar
Justinian Rosca
View author publications
You can also search for this author in PubMed Google Scholar
Viswanathan Ramasubramanian
View author publications
You can also search for this author in PubMed Google Scholar
Subramani Thiyagarajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Justinian Rosca .

Editor information

Editors and Affiliations

, Department of Mathematics, University of Maryland, Norbert Wiener Center, College Park, 20742-0001, Maryland, USA
Travis D. Andrews
Norbert Wiener Center, Department of Mathematics, University of Maryland, College Park, 20742, Maryland, USA
Radu Balan
Department of Mathematics, University of Maryland, College Park, 20742-4015, Maryland, USA
John J. Benedetto
Department of Mathematics, University of Maryland, College Park, 20742-4015, Maryland, USA
Wojciech Czaja
, Department of Mathematics, University of Maryland, College Park, 20742, Maryland, USA
Kasso A. Okoudjou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Claussen, H., Rosca, J., Ramasubramanian, V., Thiyagarajan, S. (2013). Generalized Mutual Interdependence Analysis of Noisy Channels. In: Andrews, T., Balan, R., Benedetto, J., Czaja, W., Okoudjou, K. (eds) Excursions in Harmonic Analysis, Volume 1. Applied and Numerical Harmonic Analysis. Birkhäuser, Boston. https://doi.org/10.1007/978-0-8176-8376-4_18

Download citation

DOI: https://doi.org/10.1007/978-0-8176-8376-4_18
Published: 20 November 2012
Publisher Name: Birkhäuser, Boston
Print ISBN: 978-0-8176-8375-7
Online ISBN: 978-0-8176-8376-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics