Audio Processing and Speech Recognition pp 45-66 | Cite as
Feature Extraction
Chapter
First Online:
Abstract
In order to classify any audio or speech signal, feature extraction is the prerequisite. The analog speech signal s(t) is sampled a number of times per second to be stored in some recording device or simply on a computer.
References
- 1.De Poli, G., & Mion, L. (2006). From audio to content. Livro não publicado. Padova: Dipartimento di Ingegneria Dell’Informazione-Università degli Studi di Padova.Google Scholar
- 2.Song, Y., Wang, W. H., & Guo, F. J. (2009). Feature extraction and classification for audio information in news video. In 2009 International Conference on Wavelet Analysis and Pattern Recognition, ICWAPR 2009 (pp. 43–46). IEEE.Google Scholar
- 3.Burred, J. J., & Lerch, A. (2004). Hierarchical automatic audio signal classification. Journal of the Audio Engineering Society, 52(7/8), 724–739.Google Scholar
- 4.Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293–302.CrossRefGoogle Scholar
- 5.Galembo, A., & Askenfelt, A. (1994). Measuring inharmonicity through pitch extraction. Journal STL-QPSR, 35(1), 135–144.Google Scholar
- 6.Fletcher, H. (1964). Normal vibration frequencies of a stiff piano string. The Journal of the Acoustical Society of America, 36(1), 203–209.CrossRefGoogle Scholar
- 7.Retrieved September 09, 2018, from https://pages.mtu.edu/~suits/autocorrelation.html.
- 8.American National Standards Institute. (1973). American national psychoacoustical terminology S3. 20.Google Scholar
- 9.Lazaro, A., Sarno, R., Andre, R. J., & Mahardika, M. N. (2017). Music tempo classification using audio spectrum centroid, audio spectrum flatness, and audio spectrum spread based on MPEG-7 audio features. In 2017 3rd International Conference on Science in Information Technology (ICSITech) (pp. 41–46). IEEE.Google Scholar
- 10.Burred, J. J., & Lerch, A. (2004). Hierarchical automatic audio signal classification. Journal of the Audio Engineering Society, 52(7/8), 724–739.Google Scholar
- 11.Chauhan, P. M., & Desai, N. P. (2014). Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. In 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) (pp. 1–5). IEEE.Google Scholar
- 12.Lindblom, B., Sundberg, J., Branderud, P., Djamshidpey, H., & Granqvist, S. (2010). The Gunnar Fant legacy in the study of vocal acoustics. In 10ème Congrès Français d’Acoustique.Google Scholar
- 13.Retrieved September 13, 2018, from https://www.yumpu.com/en/document/view/18555951/l7-linear-prediction-of-speech.
- 14.Retrieved September 12, 2018, from https://www.ece.ucsb.edu/Faculty/Rabiner/ece259/speech%20course.html.
- 15.Bradbury, J. (2000). Linear predictive coding. Hill: Mc G.Google Scholar
- 16.Kumar, C. S., & Rao, P. M. (2011). Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm. International Journal on Computer Science and Engineering, 3(8), 2942.Google Scholar
- 17.Shrawankar, U., & Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: A comparative study. arXiv preprint arXiv:1305.1145.
- 18.Benba, A., Jilbab, A., & Hammouch, A. (2014). Voice analysis for detecting persons with Parkinson’s disease using MFCC and VQ. In The 2014 International Conference on Circuits, Systems and Signal Processing (pp. 23–25).Google Scholar
- 19.Young, S., et al. (2006). The HTK book (v3. 4). Cambridge University.Google Scholar
- 20.Brigham, E. O., & Morrow, R. E. (1967). The fast Fourier transform. IEEE spectrum, 4(12), 63–70.CrossRefGoogle Scholar
- 21.Retrieved September 16, 2018, from kom.aau.dk/group/04gr742/pdf/MFCC_worksheet.pdf.
- 22.Wanli, Z., & Guoxin, L. (2013). The research of feature extraction based on MFCC for speaker recognition. In 2013 3rd International Conference on Computer Science and Network Technology (ICCSNT) (pp. 1074–1077). IEEE.Google Scholar
- 23.Paliwal, K. K. (1982). On the performance of the quefrency-weighted cepstral coefficients in vowel recognition. Speech Communication, 1(2), 151–154.MathSciNetCrossRefGoogle Scholar
- 24.Tohkura, Y. (1987). A weighted cepstral distance measure for speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(10), 1414–1422.CrossRefGoogle Scholar
- 25.Juang, B. H., Rabiner, L., & Wilpon, J. G. (1986). On the use of bandpass liftering in speech recognition. In IEEE International Conference on ICASSP’86 Acoustics, Speech, and Signal Processing (Vol. 11, pp. 765–768). IEEE.Google Scholar
- 26.Itakura, F., & Umezaki, T. (1987). Distance measure for speech recognition based on the smoothed group delay spectrum. In IEEE International Conference on ICASSP’87 Acoustics, Speech, and Signal Processing (Vol. 12, pp. 1257–1260). IEEE.Google Scholar
- 27.Hanson, B., & Wakita, H. (1987). Spectral slope distance measures with linear prediction analysis for word recognition in noise. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(7), 968–973.CrossRefGoogle Scholar
- 28.Paliwal, K. K. (1999). Decorrelated and liftered filter-bank energies for robust speech recognition. In Sixth European Conference on Speech Communication and Technology.Google Scholar
- 29.Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.CrossRefGoogle Scholar
- 30.Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands (Frequenzgruppen). The Journal of the Acoustical Society of America, 33(2), 248–248.CrossRefGoogle Scholar
- 31.Hermes, D. J. Sound Perception: The Science of Sound Design.Google Scholar
- 32.Stevens, S. S. (1957). On the psychophysical law. Psychological Review, 64(3), 153.CrossRefGoogle Scholar
- 33.Graps, A. (1995). An introduction to wavelets. IEEE Computational Science and Engineering, 2(2), 50–61.CrossRefGoogle Scholar
- 34.Polikar, R. (1996). Fundamental concepts & an overview of the wavelet theory. In The Wavelet Tutorial Part I. Rowan University, College of Engineering Web Servers (vol. 15).Google Scholar
- 35.Avci, E., & Akpolat, Z. H. (2006). Speech recognition using a wavelet packet adaptive network based fuzzy inference system. Expert Systems with Applications, 31(3), 495–503.CrossRefGoogle Scholar
- 36.Siafarikas, M., Ganchev, T., & Fakotakis, N. (2004). Wavelet packet based speaker verification. In ODYSSEY04-The Speaker and Language Recognition Workshop.Google Scholar
- 37.Buckheit, J. B., & Donoho, D. L. (1995). Wavelab and reproducible research. In Wavelets and statistics (pp. 55–81). New York: Springer.CrossRefGoogle Scholar
- 38.Wesfreid, E., & Wickerhauser, M. V. (1993). Adapted local trigonometric transforms and speech processing. IEEE Transactions on Signal Processing, 41(12), 3596–3600.CrossRefGoogle Scholar
- 39.Visser, E., Otsuka, M., & Lee, T. W. (2003). A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments. Speech Communication, 41(2–3), 393–407.CrossRefGoogle Scholar
- 40.Tufekci, Z., & Gowdy, J. N. (2000). Feature extraction using discrete wavelet transform for speech recognition. In Proceedings of the IEEE Southeastcon 2000 (pp. 116–123). IEEE.Google Scholar
- 41.El-Attar, A., Ashour, A. S., Dey, N., Abdelkader, H., Abd El-Naby, M. M., & Sherratt, R. S. (2018). Discrete wavelet transform-based freezing of gait detection in Parkinson’s disease. Journal of Experimental & Theoretical Artificial Intelligence, 1–17.Google Scholar
- 42.Mukhopadhyay, S., Biswas, S., Roy, A. B., & Dey, N. (2012). Wavelet based QRS complex detection of ECG signal. arXiv preprint arXiv:1209.1563.
- 43.Rady, E. R., Yahia, A. H., El-Sayed, A., & El-Borey, H. Speech recognition system based on wavelet transform and artificial neural network.Google Scholar
- 44.Zbancioc, M., & Costin, M. (2003). Using neural networks and LPCC to improve speech recognition. In 2003 International Symposium on Signals, Circuits and Systems, SCS 2003 (Vol. 2, pp. 445–448). IEEE.Google Scholar
- 45.Paul, A. K., Das, D., & Kamal, M. M. (2009). Bangla speech recognition system using LPC and ANN. In Seventh International Conference on Advances in Pattern Recognition, 2009. ICAPR’09 (pp. 171–174). IEEE.Google Scholar
- 46.Kuo, K. (2010). Feature extraction and recognition of infant cries. In 2010 IEEE International Conference on Electro/Information Technology (EIT) (pp. 1–5). IEEE.Google Scholar
- 47.Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In 2010 4th International Conference on Signal Processing and Communication Systems (ICSPCS) (pp. 1–5). IEEE.Google Scholar
- 48.Wanli, Z., & Guoxin, L. (2013). The research of feature extraction based on MFCC for speaker recognition. In 2013 3rd International Conference on Computer Science and Network Technology (ICCSNT) (pp. 1074–1077). IEEE.Google Scholar
- 49.Sharma, D., & Ali, I. (2015). A modified MFCC feature extraction technique for robust speaker recognition. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1052–1057). IEEE.Google Scholar
- 50.Najafi, J., & Marvi, H. (2009). PLP based CELP speech coder. In 2009 Second International Conference on Computer and Electrical Engineering, ICCEE’09 (Vol. 1, pp. 263–267). IEEE.Google Scholar
- 51.Chaloupka, J., Červa, P., Silovský, J., Žd’ánský, J., & Nouza, J. (2012). Modification of the speech feature extraction module for the improvement of the system for automatic lectures transcription. In ELMAR, 2012 Proceedings (pp. 223–226). IEEE.Google Scholar
- 52.Saeidi, R., Alku, P., & Bäckström, T. (2016). Feature extraction using power-law adjusted linear prediction with application to speaker recognition under severe vocal effort mismatch. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(1), 42–53.CrossRefGoogle Scholar
- 53.Gang, R., Bocko, M. F., & Headlam, D. (2010). Reverberation features identification from music recordings using the discrete wavelet transform. In 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (pp. 161–164). IEEE.Google Scholar
- 54.Nehe, N. S., & Holambe, R. S. (2012). DWT and LPC based feature extraction methods for isolated word recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2012(1), 7.CrossRefGoogle Scholar
- 55.Kristomo, D., Hidayat, R., & Soesanti, I. (2016). Feature extraction and classification of the Indonesian syllables using Discrete Wavelet Transform and statistical features. In International Conference on Science and Technology-Computer (ICST) (pp. 88–92). IEEE.Google Scholar
Copyright information
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019