Abstract
Screening a cold may be beneficial in the sense of avoiding the propagation of it. In this study, we present a technique for classifying subjects having a cold by using their speech. In order to achieve this goal, we make use of frame-level representations of the recordings of the subjects. Such representations are exploited by a generative Gaussian Mixture Model (GMM) which consequently produces a fixed-length encoding, i.e. Fisher vectors, based on the Fisher Vector (FV) approach. Afterward, we compare the classification performance of the two algorithms: a linear kernel SVM and a XGBoost Classifier. Due to the data sets having a high class imbalance, we undersample the majority class. Applying Power Normalization (PN) and Principal Component Analysis (PCA) on the FV features proved effective at improving the classification score: SVM achieved a final score of 67.81% of Unweighted Average Recall (UAR) on the test set. However, XGBoost gave better results on the test set by just using raw Fisher vectors; and with this combination we achieved a UAR score of 70.43%. The latter classification approach outperformed the original (non-fused) baseline score given in ‘The INTERSPEECH 2017 Computational Paralinguistics Challenge’.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arandjelovic, R., Zisserman, A.: All about VLAD. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1578–1585 (2013)
Cai, D., Ni, Z., Liu, W., Cai, W., Li, G., Li, M.: End-to-end deep learning framework for speech paralinguistics detection based on perception aware spectrum. In: Proceedings of Interspeech, pp. 3452–3456 (2017)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: British Machine Vision Conference, vol. 2, pp. 76.1–76.12, November 2011
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining abs/1603.02754, pp. 785–794 (2016)
Egas-López, J.V., Orozco-Arroyave, J.R., Gosztolya, G.: Assessing Parkinson’s disease from speech using fisher vectors. In: Proceedings of Interspeech (2019)
Egas López, J.V., Tóth, L., Hoffmann, I., Kálmán, J., Pákáski, M., Gosztolya, G.: Assessing Alzheimer’s disease from speech using the i-vector approach. In: Salah, A.A., Karpov, A., Potapova, R. (eds.) SPECOM 2019. LNCS (LNAI), vol. 11658, pp. 289–298. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26061-3_30
Friedman, J.H.: Greedy function approximation: a Gradient Boosting Machine. Ann. Stat. 29, 1189–1232 (2001)
Gosztolya, G.: Using the Fisher vector representation for audio-based emotion recognition. Acta Polytechnica Hungarica 17(6), 7–23 (2020)
Gosztolya, G., Bagi, A., Szalóki, S., Szendi, I., Hoffmann, I.: Identifying schizophrenia based on temporal parameters in spontaneous speech. In: Proceedings of Interspeech, Hyderabad, India, pp. 3408–3412, September 2018
Gosztolya, G., Busa-Fekete, R., Grósz, T., Tóth, L.: DNN-based feature extraction and classifier combination for child-directed speech, cold and snoring identification. In: Proceedings of Interspeech, Stockholm, Sweden, pp. 3522–3526, August 2017
Gosztolya, G., Grósz, T., Szaszák, G., Tóth, L.: Estimating the sincerity of apologies in speech by DNN rank learning and prosodic analysis. In: Proceedings of Interspeech, San Francisco, CA, USA, pp. 2026–2030, September 2016
Gosztolya, G., Grósz, T., Tóth, L.: General utterance-level feature extraction for classifying crying sounds, atypical and self-assessed affect and heart beats. In: Proceedings of Interspeech, Hyderabad, India, pp. 531–535, September 2018
Huckvale, M., Beke, A.: It sounds like you have a cold! testing voice features for the interspeech 2017 computational paralinguistics cold challenge. In: Proceedings of Interspeech, International Speech Communication Association (ISCA) (2017)
Jaakkola, T.S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Proceedings of NIPS, Denver, CO, USA, pp. 487–493 (1998)
Kaya, H., Karpov, A.A.: Introducing weighted kernel classifiers for handling imbalanced paralinguistic corpora: snoring, addressee and cold. In: Interspeech, pp. 3527–3531 (2017)
Kaya, H., Karpov, A.A., Salah, A.A.: Fisher vectors with cascaded normalization for paralinguistic analysis. In: Proceedings of Interspeech, pp. 909–913 (2015)
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)
Long, J.M., Yan, Z.F., Shen, Y.L., Liu, W.J., Wei, Q.Y.: Detection of Epilepsy using MFCC-based feature and XGBoost. In: 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–4. IEEE (2018)
Moreno, P.J., Rifkin, R.: Using the Fisher kernel method for web audio classification. In: Proceedings of ICASSP, Dallas, TX, USA, pp. 2417–2420 (2010)
Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorob. 7, 21 (2013)
Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput. Vis. Image Underst. 150, 109–125 (2016)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007.https://doi.org/10.1109/CVPR.2007.383266
Rosenberg, A.: Classifying skewed data: importance weighting to optimize average recall. In: Proceedings of Interspeech, pp. 2242–2245 (2012)
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vision 105(3), 222–245 (2013). https://doi.org/10.1007/s11263-013-0636-x
Schuller, B., et al.: The Interspeech 2017 computational paralinguistics challenge: addressee, cold and snoring. In: Computational Paralinguistics Challenge (ComParE), Interspeech 2017, pp. 3442–3446 (2017)
Schuller, B.W., Batliner, A.M.: Emotion, Affect and Personality in Speech and Language Processing. Wiley, Hoboken (1988)
Seeland, M., Rzanny, M., Alaqraa, N., Wäldchen, J., Mäder, P.: Plant species classification using flower images: a comparative study of local feature representations. PLOS ONE 12(2), 1–29 (2017)
Smith, D.C., Kornelson, K.A.: A comparison of Fisher vectors and Gaussian supervectors for document versus non-document image classification. In: Applications of Digital Image Processing XXXVI, vol. 8856, p. 88560N. International Society for Optics and Photonics (2013)
Tian, Y., He, L., Li, Z.Y., Wu, W.L., Zhang, W.Q., Liu, J.: Speaker verification using Fisher vector. In: Proceedings of ISCSLP, Singapore, pp. 419–422 (2014)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1469–1472. ACM (2010)
Wang, C., Deng, C., Wang, S.: Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. arXiv preprint arXiv:1908.01672 (2019)
Wang, S.-H., Li, H.-T., Chang, E.-J., Wu, A.-Y.A.: Entropy-assisted emotion recognition of valence and arousal using XGBoost classifier. In: Iliadis, L., Maglogiannis, I., Plagianakos, V. (eds.) AIAI 2018. IAICT, vol. 519, pp. 249–260. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92007-8_22
Zajíc, Z., Hrúz, M.: Fisher Vectors in PLDA speaker verification system. In: Proceedings of ICSP, Chengdu, China, pp. 1338–1341 (2016)
Acknowledgments
This study was partially funded by the National Research, Development and Innovation Office of Hungary via contract NKFIH FK-124413 and by the Ministry for Innovation and Technology, Hungary (grant TUDFO/47138-1/2019-ITM). G. Gosztolya was also funded by the János Bolyai Scholarship of the Hungarian Academy of Sciences and by the Hungarian Ministry of Innovation and Technology New National Excellence Program ÚNKP-20-5.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Egas-López, J.V., Gosztolya, G. (2020). Predicting a Cold from Speech Using Fisher Vectors; SVM and XGBoost as Classifiers. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-60276-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)