Skip to main content

Predicting a Cold from Speech Using Fisher Vectors; SVM and XGBoost as Classifiers

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

Abstract

Screening a cold may be beneficial in the sense of avoiding the propagation of it. In this study, we present a technique for classifying subjects having a cold by using their speech. In order to achieve this goal, we make use of frame-level representations of the recordings of the subjects. Such representations are exploited by a generative Gaussian Mixture Model (GMM) which consequently produces a fixed-length encoding, i.e. Fisher vectors, based on the Fisher Vector (FV) approach. Afterward, we compare the classification performance of the two algorithms: a linear kernel SVM and a XGBoost Classifier. Due to the data sets having a high class imbalance, we undersample the majority class. Applying Power Normalization (PN) and Principal Component Analysis (PCA) on the FV features proved effective at improving the classification score: SVM achieved a final score of 67.81% of Unweighted Average Recall (UAR) on the test set. However, XGBoost gave better results on the test set by just using raw Fisher vectors; and with this combination we achieved a UAR score of 70.43%. The latter classification approach outperformed the original (non-fused) baseline score given in ‘The INTERSPEECH 2017 Computational Paralinguistics Challenge’.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arandjelovic, R., Zisserman, A.: All about VLAD. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1578–1585 (2013)

    Google Scholar 

  2. Cai, D., Ni, Z., Liu, W., Cai, W., Li, G., Li, M.: End-to-end deep learning framework for speech paralinguistics detection based on perception aware spectrum. In: Proceedings of Interspeech, pp. 3452–3456 (2017)

    Google Scholar 

  3. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)

    Article  Google Scholar 

  4. Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: British Machine Vision Conference, vol. 2, pp. 76.1–76.12, November 2011

    Google Scholar 

  5. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining abs/1603.02754, pp. 785–794 (2016)

    Google Scholar 

  6. Egas-López, J.V., Orozco-Arroyave, J.R., Gosztolya, G.: Assessing Parkinson’s disease from speech using fisher vectors. In: Proceedings of Interspeech (2019)

    Google Scholar 

  7. Egas López, J.V., Tóth, L., Hoffmann, I., Kálmán, J., Pákáski, M., Gosztolya, G.: Assessing Alzheimer’s disease from speech using the i-vector approach. In: Salah, A.A., Karpov, A., Potapova, R. (eds.) SPECOM 2019. LNCS (LNAI), vol. 11658, pp. 289–298. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26061-3_30

    Chapter  Google Scholar 

  8. Friedman, J.H.: Greedy function approximation: a Gradient Boosting Machine. Ann. Stat. 29, 1189–1232 (2001)

    Google Scholar 

  9. Gosztolya, G.: Using the Fisher vector representation for audio-based emotion recognition. Acta Polytechnica Hungarica 17(6), 7–23 (2020)

    Article  Google Scholar 

  10. Gosztolya, G., Bagi, A., Szalóki, S., Szendi, I., Hoffmann, I.: Identifying schizophrenia based on temporal parameters in spontaneous speech. In: Proceedings of Interspeech, Hyderabad, India, pp. 3408–3412, September 2018

    Google Scholar 

  11. Gosztolya, G., Busa-Fekete, R., Grósz, T., Tóth, L.: DNN-based feature extraction and classifier combination for child-directed speech, cold and snoring identification. In: Proceedings of Interspeech, Stockholm, Sweden, pp. 3522–3526, August 2017

    Google Scholar 

  12. Gosztolya, G., Grósz, T., Szaszák, G., Tóth, L.: Estimating the sincerity of apologies in speech by DNN rank learning and prosodic analysis. In: Proceedings of Interspeech, San Francisco, CA, USA, pp. 2026–2030, September 2016

    Google Scholar 

  13. Gosztolya, G., Grósz, T., Tóth, L.: General utterance-level feature extraction for classifying crying sounds, atypical and self-assessed affect and heart beats. In: Proceedings of Interspeech, Hyderabad, India, pp. 531–535, September 2018

    Google Scholar 

  14. Huckvale, M., Beke, A.: It sounds like you have a cold! testing voice features for the interspeech 2017 computational paralinguistics cold challenge. In: Proceedings of Interspeech, International Speech Communication Association (ISCA) (2017)

    Google Scholar 

  15. Jaakkola, T.S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Proceedings of NIPS, Denver, CO, USA, pp. 487–493 (1998)

    Google Scholar 

  16. Kaya, H., Karpov, A.A.: Introducing weighted kernel classifiers for handling imbalanced paralinguistic corpora: snoring, addressee and cold. In: Interspeech, pp. 3527–3531 (2017)

    Google Scholar 

  17. Kaya, H., Karpov, A.A., Salah, A.A.: Fisher vectors with cascaded normalization for paralinguistic analysis. In: Proceedings of Interspeech, pp. 909–913 (2015)

    Google Scholar 

  18. Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)

    Google Scholar 

  19. Long, J.M., Yan, Z.F., Shen, Y.L., Liu, W.J., Wei, Q.Y.: Detection of Epilepsy using MFCC-based feature and XGBoost. In: 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–4. IEEE (2018)

    Google Scholar 

  20. Moreno, P.J., Rifkin, R.: Using the Fisher kernel method for web audio classification. In: Proceedings of ICASSP, Dallas, TX, USA, pp. 2417–2420 (2010)

    Google Scholar 

  21. Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorob. 7, 21 (2013)

    Article  Google Scholar 

  22. Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput. Vis. Image Underst. 150, 109–125 (2016)

    Article  Google Scholar 

  23. Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007.https://doi.org/10.1109/CVPR.2007.383266

  24. Rosenberg, A.: Classifying skewed data: importance weighting to optimize average recall. In: Proceedings of Interspeech, pp. 2242–2245 (2012)

    Google Scholar 

  25. Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vision 105(3), 222–245 (2013). https://doi.org/10.1007/s11263-013-0636-x

    Article  MathSciNet  MATH  Google Scholar 

  26. Schuller, B., et al.: The Interspeech 2017 computational paralinguistics challenge: addressee, cold and snoring. In: Computational Paralinguistics Challenge (ComParE), Interspeech 2017, pp. 3442–3446 (2017)

    Google Scholar 

  27. Schuller, B.W., Batliner, A.M.: Emotion, Affect and Personality in Speech and Language Processing. Wiley, Hoboken (1988)

    Google Scholar 

  28. Seeland, M., Rzanny, M., Alaqraa, N., Wäldchen, J., Mäder, P.: Plant species classification using flower images: a comparative study of local feature representations. PLOS ONE 12(2), 1–29 (2017)

    Google Scholar 

  29. Smith, D.C., Kornelson, K.A.: A comparison of Fisher vectors and Gaussian supervectors for document versus non-document image classification. In: Applications of Digital Image Processing XXXVI, vol. 8856, p. 88560N. International Society for Optics and Photonics (2013)

    Google Scholar 

  30. Tian, Y., He, L., Li, Z.Y., Wu, W.L., Zhang, W.Q., Liu, J.: Speaker verification using Fisher vector. In: Proceedings of ISCSLP, Singapore, pp. 419–422 (2014)

    Google Scholar 

  31. Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1469–1472. ACM (2010)

    Google Scholar 

  32. Wang, C., Deng, C., Wang, S.: Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. arXiv preprint arXiv:1908.01672 (2019)

  33. Wang, S.-H., Li, H.-T., Chang, E.-J., Wu, A.-Y.A.: Entropy-assisted emotion recognition of valence and arousal using XGBoost classifier. In: Iliadis, L., Maglogiannis, I., Plagianakos, V. (eds.) AIAI 2018. IAICT, vol. 519, pp. 249–260. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92007-8_22

    Chapter  Google Scholar 

  34. Zajíc, Z., Hrúz, M.: Fisher Vectors in PLDA speaker verification system. In: Proceedings of ICSP, Chengdu, China, pp. 1338–1341 (2016)

    Google Scholar 

Download references

Acknowledgments

This study was partially funded by the National Research, Development and Innovation Office of Hungary via contract NKFIH FK-124413 and by the Ministry for Innovation and Technology, Hungary (grant TUDFO/47138-1/2019-ITM). G. Gosztolya was also funded by the János Bolyai Scholarship of the Hungarian Academy of Sciences and by the Hungarian Ministry of Innovation and Technology New National Excellence Program ÚNKP-20-5.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Vicente Egas-López .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Egas-López, J.V., Gosztolya, G. (2020). Predicting a Cold from Speech Using Fisher Vectors; SVM and XGBoost as Classifiers. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60276-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60275-8

  • Online ISBN: 978-3-030-60276-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics