Static and dynamic information derived from source and system features for person recognition from humming

Patil, Hemant A.; Madhavi, Maulik C.; Parhi, Keshab K.

doi:10.1007/s10772-012-9161-5

Static and dynamic information derived from source and system features for person recognition from humming

Published: 28 June 2012

Volume 15, pages 393–406, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Hemant A. Patil¹,
Maulik C. Madhavi¹ &
Keshab K. Parhi²

318 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, hum of a person (instead of normal speech) is used to design a voice biometric system for person recognition. In addition, a recently proposed static feature set, viz., Variable length Teager energy based Mel Frequency Cepstral Coefficients (VTMFCC), is found to capture source-like information of a hum signal. Effectiveness of VTMFCC over linear prediction (LP) residual to capture the complementary information than MFCC is demonstrated in a hum signal. Person recognition performance is found to be better when a score-level fusion is used by combining evidences from static and dynamic features for MFCC (system) and VTMFCC (source-like) features than MFCC alone. Experiments are validated on two types of dynamic features, viz., delta cepstrum and shifted delta cepstrum. In addition, for score-level fusion using static and dynamic features % identification rate and % Equal Error Rate are observed to outperform by 7.9 % and 0.27 %, respectively than MFCC alone. Furthermore, we have observed that person recognition system gives better performance for larger frame duration 69.6 ms as opposed to traditional 10–30 ms frame duration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Databases, features and classifiers for speech emotion recognition: a review

Article 19 January 2018

Milestones in speaker recognition

Article Open access 15 February 2024

References

Amino, K., & Arai, T. (2009). Speaker-dependent characteristics of the nasals. Forensic Science International, 185(1–3), 21–28.
Article Google Scholar
Calvo, J. R., Fernández, R., & Hernández, G. (2007). Application of shifted delta cepstral features in speaker verification. In Proc. INTERSPEECH’07, Antwerp, Belgium, Sept. 2007 (pp. 734–737).
Google Scholar
Campbell, W. M., Assaleh, K. T., & Broun, C. C. (2002). Speaker recognition with polynomial classifiers. IEEE Transactions on Speech and Audio Processing, 10(4), 205–212.
Article Google Scholar
Dang, J., & Honda, K. (1996). Acoustic characteristics of the human paranasal sinuses derived from transmission characteristics measurement and morphological observation. The Journal of the Acoustical Society of America, 100(5), 3374–3383.
Article Google Scholar
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.
Article Google Scholar
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272.
Article Google Scholar
Jain, A. K., Ross, A., & Prabhakar, S. (2004). An introduction to biometric recognition. IEEE Transactions on Circuits and Systems, 14(1), 4–20.
Google Scholar
Jin, M., Kim, J., & Yoo, C. D. (2009). Humming-based human verification and identification. In Proc. int. conf. on acoust. speech and signal processing, ICASSP’09 (pp. 1453–1456).
Google Scholar
Kaiser, J. F. (1990). On a simple algorithm to calculate the ‘energy’ of a signal. In Proc. of int. conf. on acoust., speech and signal processing, ICASSP’90 (Vol. 1, pp. 381–384).
Chapter Google Scholar
Kominek, J., & Black, A. (2003). The CMU ARCTIC databases for speech synthesis (Tech. Rep. CMU-LTI-03-177). Language Technologies Institute, Carnegie Mellon University. http://www.festvox.org/cmu_arctic. Accessed 18 June 2012.
Martin, A. F., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In Proc. EUROSPEECH’97 (Vol. 4, pp. 1899–1903).
Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55.
Article Google Scholar
Nicholson, S., Milner, B., & Cox, S. (1997). Evaluating feature set performance using the F-ratio and J-measures. In Proc. EUROSPEECH’97 (pp. 413–416).
Google Scholar
Patil, H. A., & Madhavi, M. C. (2012). Significance of magnitude and phase information via VTEO for humming based biometrics. In International conference on biometrics, ICB’12, Delhi, India, March 30–April 1, 2012.
Google Scholar
Patil, H. A., & Parhi, K. K. (2010). Novel variable length Teager energy based features for person recognition from their hum. In Proc. int. conf. acoust., speech and signal processing, ICASSP’10, Dallas, Texas, USA, March 14–19, 2010 (pp. 4526–4529).
Google Scholar
Patil, H. A., Jain, R., & Jain, P. (2008). Identification of speakers from their hum. In P. Sojka et al. (Eds.), Lecture notes in artificial intelligence, LNAI: Vol. 5426. TSD (pp. 461–468). Berlin: Springer.
Google Scholar
Patil, H. A., Jain, R., & Jain, P. (2009). A novel approach to identification of speakers from their hum. In 7th int. conf. advances in pattern recognition, ICAPR, ISI Kolkata, Feb. 4–6, 2009 (pp. 167–170). Washington: IEEE Computer Society.
Chapter Google Scholar
Patil, H. A., Madhavi, M. C., & Parhi, K. K. (2011). Combining evidence from spectral and source-like features for person recognition from humming. In Proc. INTERSPEECH’11, Florence, Italy, 28–31 Aug. 2011 (pp. 369–372).
Google Scholar
Patil, H. A., Madhavi, M. C., Jain, R., & Jain, A. K. (2012). Combining evidences from temporal and spectral features for person recognition from humming. In M. K. Kundu et al. (Eds.), Lecture notes in computer science, LNCS: Vol. 7143. PerMIn (pp. 321–328). Berlin: Springer.
Google Scholar
Quatieri, T. F. (2002). Discrete-time speech signal processing: principal and practice. Delhi: Prentice Hall.
Google Scholar
Tomar, V., & Patil, H. A. (2008). On the development of variable length Teager energy Operator (VTEO). In Proc. INTERSPEECH’08, Brisbane, Australia, September 22–26, 2008 (pp. 1056–1059).
Google Scholar
Weiss, N. A., & Hasset, M. J. (1993). Introductory statistics (3rd ed.). Reading: Addison-Wesley.
Google Scholar
Yingyong, Q., & Fox, R. A. (1992). Analysis of nasal consonants using perceptual linear prediction. The Journal of the Acoustical Society of America, 91(3), 1718–1726.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India
Hemant A. Patil & Maulik C. Madhavi
Department of Electrical and Computer Engineering, University of Minnesota, Twin Cities Campus, Minneapolis, USA
Keshab K. Parhi

Authors

Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar
Maulik C. Madhavi
View author publications
You can also search for this author in PubMed Google Scholar
Keshab K. Parhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hemant A. Patil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patil, H.A., Madhavi, M.C. & Parhi, K.K. Static and dynamic information derived from source and system features for person recognition from humming. Int J Speech Technol 15, 393–406 (2012). https://doi.org/10.1007/s10772-012-9161-5

Download citation

Received: 21 May 2012
Accepted: 11 June 2012
Published: 28 June 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s10772-012-9161-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Static and dynamic information derived from source and system features for person recognition from humming

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition: A Comprehensive Survey

Databases, features and classifiers for speech emotion recognition: a review

Milestones in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Static and dynamic information derived from source and system features for person recognition from humming

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition: A Comprehensive Survey

Databases, features and classifiers for speech emotion recognition: a review

Milestones in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation