Abstract
The main motivation for our present work is to reliably perform voice (or signal) detection for a source of interest from a single microphone recording. We rely on the assumption that the input signal contains invariant information about the channel, or transfer function from each source to the microphone, which could be reliably exploited for signal detection and classification. In this chapter we employ a nonconventional method called generalized mutual interdependence analysis (GMIA) that proposes a model for the computation of this hidden invariant information present across multiple measurements. Such information turns out to be a good characteristic feature of a signal source, transformation, or composition that fits the model. This chapter introduces a unitary and succinct description of the underlying model of GMIA, and the formulation and solution of the corresponding optimization problem. We apply GMIA for feature extraction in the problem of own-voice activity detection, which aims at classification of a near-field channel based on access to prior information about GMIA features of the channel. It is extremely challenging to recognize the presence of voice in noisy scenarios with interference from music, car noise, or street noise. We compare GMIA with MFCC and cepstral-mean features. For example, GMIA performs with equal error rates below 10 % for music interference of SNRs down to − 20 dB.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The instance i implicitly represents the timescale of interest, e.g., a timescale of the order of the pitch period (10–20 ms) or of the order of the average word period (500 ms).
- 2.
The spectrum of the excitation changes slowly for voiced sounds and appears unchanged although radically different over the duration of a consonant, at the phonetic timescale.
- 3.
A detailed analysis of these components of the speech production model is beyond present scope.
References
Benesty, J., Sondhi, M.M., Huang, Y.: Handbook of Speech Processing. Springer, Berlin (2008)
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process. 4, 430–451 (2004)
Cho, Y., Al-Naimi, K., Kondoz, A.: Improved voice activity detection based on a smoothed statistical likelihood ratio. In: International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 737–740. IEEE, Los Alamitos, CA (2001)
Claussen, H., Rosca, J., Damper, R.: Mutual interdependence analysis. In: Independent Component Analysis and Blind Signal Separation, pp. 446–453. Springer, Heidelberg (2007)
Claussen, H., Rosca, J., Damper, R.: Mutual features for robust identification and verification. In: International Conference on Acoustics, Speech and Signal Processing, pp. 1849–1852. Las Vegas, NV (2008)
Claussen, H., Rosca, J., Damper, R.: Generalized mutual interdependence analysis. In: International Conference on Acoustics, Speech and Signal Processing, pp. 3317–3320. Taipei, Taiwan (2009)
Claussen, H., Rosca, J., Damper, R.I.: Signature extraction using mutual interdependencies. Pattern Recognit. 44, 650–661 (2011)
Deng, L., O’Shaughnessy, D.: Speech Processing: A Dynamic and Optimization-Oriented Approach. Signal Process. Commun. Dekker, New York (2003)
ETSI: Digital cellular telecommunication system (phase 2+); voice activity detector VAD for adaptive multi rate (AMR) speech traffic channels; general description. Technical Report V.7.0.0, ETSI (1999)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)
Fisher, W.M., Doddington, G.R., Goudie-Marshall, K.M., Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J.: NTIMIT. Linguistic Data Consortium, Philadelphia CDROM (1993). http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S2
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia CDROM (1993). http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1
Haigh, J., Mason, J.: Robust voice activity detection using cepstral features. In: IEEE Region 10 Conference TENCON, vol. 3, pp. 321–324. IEEE (1993)
Hotelling, H.: Relation between two sets of variates. Biometrika 28, 322–377 (1936)
Hoyt, J.D., Wechsler, H.: Detection of human speech in structured noise. In: International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 237–240. IEEE (1994)
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A guide to Theory, Algorithm, and System Development. Prentice Hall, New York (2001)
Liu, P., Wang, Z.: Voice activity detection using visual information. In: International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 609–612. Montreal, Canada (2004)
Qu, T., Xiao, Z., Gong, M., Huang, Y., Li, X., Wu, X.: Distance-dependent head-related transfer functions measured with high spatial resolution using a spark gap. IEEE Trans. Audio, Speech Lang. Process. 17(6), 1124–1132 (2009)
Reynolds, D.A., Campbell, W.M.: Text-independent speaker recognition. In: Benesty, J., Sondhi, M., Huang, Y. (eds.) Handbook of Speech Processing and Communication, pp. 763–781. Springer GMBH, New York (2007)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech, Audio Process. 3(1), 72–83 (1995)
Rosca, J., Balan, R., Fan, N., Beaugeant, C., Gilg, V.: Multichannel voice detection in adverse environments. In: European Signal Processing Conference (2002)
Srinivasan, K., Gersho, A.: Voice activity detection for cellular networks. In: IEEE Speech Coding Workshop, pp. 85–86 (1993)
Tikhonov, A.: On the stability of inverse problems. Doklady Akademii Nauk SSSR 39(5), 195–198 (1943)
Zhang, Z., Liu, Z., Sinclair, M., Acero, A., Deng, L., Huang, X., Zheng, Y.: Multi-sensory microphones for robust speech detection, enhancement and recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 781–784. IEEE (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Birkhäuser Boston
About this chapter
Cite this chapter
Claussen, H., Rosca, J., Ramasubramanian, V., Thiyagarajan, S. (2013). Generalized Mutual Interdependence Analysis of Noisy Channels. In: Andrews, T., Balan, R., Benedetto, J., Czaja, W., Okoudjou, K. (eds) Excursions in Harmonic Analysis, Volume 1. Applied and Numerical Harmonic Analysis. Birkhäuser, Boston. https://doi.org/10.1007/978-0-8176-8376-4_18
Download citation
DOI: https://doi.org/10.1007/978-0-8176-8376-4_18
Published:
Publisher Name: Birkhäuser, Boston
Print ISBN: 978-0-8176-8375-7
Online ISBN: 978-0-8176-8376-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)