Advertisement

Evolutionary Intelligence

, Volume 10, Issue 1–2, pp 1–21 | Cite as

Feature selection for speaker verification using genetic programming

  • Róisín Loughran
  • Alexandros Agapitos
  • Ahmed Kattan
  • Anthony Brabazon
  • Michael O’Neill
Research Paper

Abstract

We present a study examining feature selection from high performing models evolved using genetic programming (GP) on the problem of automatic speaker verification (ASV). ASV is a highly unbalanced binary classification problem in which a given speaker must be verified against everyone else. We evolve classification models for 10 individual speakers using a variety of fitness functions and data sampling techniques and examine the generalisation of each model on a 1:9 unbalanced set. A significant difference between train and test performance is found which may indicate overfitting in the models. Using only the best generalising models, we examine two methods for selecting the most important features. We compare the performance of a number of tuned machine learning classifiers using the full 275 features and a reduced set of 20 features from both feature selection methods. Results show that using only the top 20 features found in high performing GP programs led to test classifications that are as good as, or better than, those obtained using all data in the majority of experiments undertaken. The classification accuracy between speakers varies considerably across all experiments showing that some speakers are easier to classify than others. This indicates that in such real-world classification problems, the content and quality of the original data has a very high influence on the quality of results obtainable.

Keywords

Speaker verification Feature selection Unbalanced data Genetic programming 

Notes

Acknowledgements

The funding was provided by Science Foundation Ireland (IE) (Grant Nos. 13/IA/1850, 08/SRC/FM1389).

References

  1. 1.
    Agapitos A, Brabazon A, O’Neill M (2012) Controlling overfitting in symbolic regression based on a bias/variance error decomposition. In: PPSN XII (part 1), LNCS, Springer, Taormina, Italy, vol 7491, pp 438–447. doi: 10.1007/978-3-642-32937-1_44
  2. 2.
    Alegre F, Amehraye A, Evans N (2013) Spoofing countermeasures to protect automatic speaker verification from voice conversion. In: 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 3068–3072Google Scholar
  3. 3.
    Barandela R, Sánchez JS, Garcıa V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recognit 36(3):849–851CrossRefGoogle Scholar
  4. 4.
    Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explor Newsl 6(1):20–29CrossRefGoogle Scholar
  5. 5.
    Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6(1):20–29CrossRefGoogle Scholar
  6. 6.
    Batista GEAPA, Prati RC, Monard MC (2005) Balancing strategies and class overlapping. In: Advances in intelligent data analysis VI, 6th international symposium on intelligent data analysis, IDA 2005, Madrid, Spain, September 8–10, 2005, Proceedings, LNCS, Springer, Berlin, vol 3646, pp 24–35Google Scholar
  7. 7.
    Bhowan U, Johnston M, Zhang M (2012) Developing new fitness functions in genetic programming for classification with unbalanced data. Syst Man Cybern Part B Cybern IEEE Trans 42(2):406–421CrossRefGoogle Scholar
  8. 8.
    Bhowan U, Johnston M, Zhang M, Yao X (2013) Evolving diverse ensembles using genetic programming for classification with unbalanced data. Evolut Comput IEEE Trans 17(3):368–386CrossRefGoogle Scholar
  9. 9.
    Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using gmm supervectors for speaker verification. Signal Process Lett IEEE 13(5):308–311CrossRefGoogle Scholar
  10. 10.
    Charbuillet C, Gas B, Chetouani M, Zarader JL (2009) Optimizing feature complementarity by evolution strategy: application to automatic speaker verification. Speech Commun 51(9):724–731CrossRefGoogle Scholar
  11. 11.
    Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explor Newsl 6(1):1–6CrossRefGoogle Scholar
  12. 12.
    Chen L, Lee KA, Ma B, Guo W, Li H, Dai LR (2016) Exploration of local variability in text-independent speaker verification. J Signal Process Syst 82(2):217–228CrossRefGoogle Scholar
  13. 13.
    Curry R, Lichodzijewski P, Heywood MI (2007) Scaling genetic programming to large datasets using hierarchical dynamic subset selection. IEEE Trans Syst Man Cybern Part B Cybern 37(4):1065–1073CrossRefGoogle Scholar
  14. 14.
    Dat TT, Kim JY, Kim HG, Lee KR (2015) Robust speaker verification using low-rank recovery under total variability space. In: IT convergence and security (ICITCS), 2015 5th international conference on, IEEE, pp 1–4Google Scholar
  15. 15.
    Day P, Nandi AK (2007) Robust text-independent speaker verification using genetic programming. Audio Speech Lang Process IEEE Trans 15(1):285–295CrossRefGoogle Scholar
  16. 16.
    Day P, Nandi AK (2011) Evolution of superfeatures through genetic programming. Expert Syst 28(2):167–184CrossRefGoogle Scholar
  17. 17.
    Dehak N, Dumouchel P, Kenny P (2007) Modeling prosodic features with joint factor analysis for speaker verification. Audio Speech Lang Process IEEE Trans 15(7):2095–2103CrossRefGoogle Scholar
  18. 18.
    Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. Audio Speech Lang Process IEEE Trans 19(4):788–798CrossRefGoogle Scholar
  19. 19.
    Dick G, Rimoni AP, Whigham PA (2015) A re-examination of the use of genetic programming on the oral bioavailability problem. In: Proceedings of the 2015 on genetic and evolutionary computation conference, ACM, pp 1015–1022Google Scholar
  20. 20.
    Doucette J, Heywood MI (2008) GP classification under imbalanced data sets: active sub-sampling and AUC approximation. In: Proceedings of EuroGP 2008, LNCS, Springer, Berlin, vol 4971, pp 266–277Google Scholar
  21. 21.
    Drummond C, Holte RC et al (2003) C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol 11, CiteseerGoogle Scholar
  22. 22.
    Eggermont J, Eiben AE, van Hemert JI (1999) Adapting the fitness function in GP for data mining. In: GP, Second European workshop, Göteborg, Sweden, May 26–27, 1999, proceedings, LNCS, Springer, Berlin, vol 1598, pp 193–202Google Scholar
  23. 23.
    Evans NW, Kinnunen T, Yamagishi J (2013) Spoofing and countermeasures for automatic speaker verification. In: INTERSPEECH, pp 925–929Google Scholar
  24. 24.
    Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS (1993) Darpa timit acoustic-phonetic continous speech corpus cd-rom. nist speech disc 1-1.1. NASA STI/Recon Technical Report N 93, 27,403Google Scholar
  25. 25.
    Gathercole C, Ross P (1994) Dynamic training subset selection for supervised learning in genetic programming. In: Davidor Y, Schwefel HP, Männer R (eds) Parallel problem solving from nature III, LNCS, vol 866. Springer, Berlin, Heidelberg, pp 312–321CrossRefGoogle Scholar
  26. 26.
    George KK, Kumar CS, Ramachandran K, Panda A (2015) Cosine distance features for robust speaker verification. In: Proceedings of 16th annual conference of the international speech communication association (INTERSPEECH), Dresden, Germany, September 6–10, 2015. pp 234–238Google Scholar
  27. 27.
    Goncalves I, Silva S, Melo JB, Carreiras JMB (2012) Random sampling technique for overfitting control in genetic programming. In: Proceedings of EuroGP 2012, vol 7244. LNCS Springer Verlag, Malaga, Spain, pp 218–229Google Scholar
  28. 28.
    Hasan T, Hansen JH (2014) Maximum likelihood acoustic factor analysis models for robust speaker verification in noise. IEEE/ACM Trans Audio Speech Lang Process 22(2):381–391CrossRefGoogle Scholar
  29. 29.
    Hermansky H (1990) Perceptual linear predictive (plp) analysis of speech. J Acoust Soc Am 87:1738CrossRefGoogle Scholar
  30. 30.
    Hermansky H, Morgan N, Bayya A, Kohn P (1992) Rasta-plp speech analysis technique. In: Acoustics, speech, and signal processing, 1992. ICASSP-92, 1992 IEEE international conference on, vol 1, pp 121–124Google Scholar
  31. 31.
    Hodges J, Lehmann EL et al (1962) Rank methods for combination of independent experiments in analysis of variance. Ann Math Stat 33(2):482–497MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70MathSciNetzbMATHGoogle Scholar
  33. 33.
    Holmes JH (1998) Differential negative reinforcement improves classifier system learning rate in two-class problems with unequal base rates. In: 3rd annual conf. on genetic programming, ICSC Academic Press, pp 635–642Google Scholar
  34. 34.
    Huang X, Acero A, Hon HW et al (2001) Spoken Language Processing, vol 15. Prentice Hall PTR, New JerseyGoogle Scholar
  35. 35.
    Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449zbMATHGoogle Scholar
  36. 36.
    Joder C, Schuller B (2012) Exploring nonnegative matrix factorization for audio classification: application to speaker recognition. In: Speech communication, 10. ITG Symposium; Proceedings of, VDE, pp 1–4Google Scholar
  37. 37.
    Kenny P, Boulianne G, Ouellet P, Dumouchel P (2005) Factor analysis simplified. In: Proc. ICASSP, Citeseer, vol 1, pp 637–640Google Scholar
  38. 38.
    Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Joint factor analysis versus eigenchannels in speaker recognition. Audio Speech Lang Process IEEE Trans 15(4):1435–1447CrossRefGoogle Scholar
  39. 39.
    Kenny P, Stafylakis T, Ouellet P, Alam MJ, Dumouchel P (2013) Plda for speaker verification with utterances of arbitrary duration. In: 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 7649–7653Google Scholar
  40. 40.
    Kinnunen T, Hautamäki V, Fränti P (2004) Fusion of spectral feature sets for accurate speaker identification. In: 9th conference speech and computerGoogle Scholar
  41. 41.
    Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRefGoogle Scholar
  42. 42.
    Kinnunen T, Saeidi R, Sedlák F, Lee KA, Sandberg J, Hansson-Sandsten M, Li H (2012) Low-variance multitaper mfcc features: a case study in robust speaker verification. IEEE Trans Audio Speech Lang Process 20(7):1990–2001CrossRefGoogle Scholar
  43. 43.
    Kinnunen T, Wu ZZ, Lee KA, Sedlak F, Chng ES, Li H (2012) Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4401–4404Google Scholar
  44. 44.
    Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: One-sided selection. In: Fisher DH (ed) Proceedings of the fourteenth international conference on machine learning (ICML 1997), Nashville, Tennessee, USA, July 8–12, 1997, Morgan Kaufmann, pp 179–186Google Scholar
  45. 45.
    Lartillot O, Toiviainen P: (2007) A matlab toolbox for musical feature extraction from audio. In: International conference on digital audio effects, pp 237–244Google Scholar
  46. 46.
    Li M, Kim J, Lammert A, Ghosh PK, Ramanarayanan V, Narayanan S (2016) Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. Comput Speech Lang 36:196–211CrossRefGoogle Scholar
  47. 47.
    Liares LR, Garcfa-Mateo C, Alba-Castro JL (2003) On combining classifiers for speaker authentication. Pattern Recognit 36(2):347–359CrossRefGoogle Scholar
  48. 48.
    Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. Syst Man Cybern Part B Cybern IEEE Trans 39(2):539–550CrossRefGoogle Scholar
  49. 49.
    Logan B et al (2000) Mel frequency cepstral coefficient for music modelling. In: Proceedings of 1st international symposium on music information retrieval (ISMIR), Plymouth, Massachusetts, October 23–25, 2000Google Scholar
  50. 50.
    Loughran R, Agapitos A, Kattan A, Brabazon A, O’Neill M (2016) Speaker verification on unbalanced data with genetic programming. In: Squillero G, Burelli P (eds) Applications of evolutionary computation. Springer, Cham, pp 737–753CrossRefGoogle Scholar
  51. 51.
    Loughran R, Walker J, O’Neill M, McDermott J (2012) Genetic programming for musical sound analysis. In: Machado P, Romero JJ, Carballal A (eds) Evolutionary and biologically inspired music, sound, art and design. Springer, Berlin, Heidelberg, pp 176–186CrossRefGoogle Scholar
  52. 52.
    Loughran RB (2009) Musical instrument identification with feature selection using evolutionary methods. Ph.D. thesis, University of LimerickGoogle Scholar
  53. 53.
    Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63(4):561–580CrossRefGoogle Scholar
  54. 54.
    Márquez-Vera C, Cano A, Romero C, Ventura S (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38(3):315–330CrossRefGoogle Scholar
  55. 55.
    Meriem F, Farid H, Messaoud B, Abderrahmene A (2014) Robust speaker verification using a new front end based on multitaper and gammatone filters. In: Signal-image technology and internet-based systems (SITIS), 2014 tenth international conference on, IEEE, pp 99–103Google Scholar
  56. 56.
    O’Shaughnessy D (1987) Speech communication: human and machine. Universities Press (India) Pvt. LimitedGoogle Scholar
  57. 57.
    Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted gaussian mixture models. Digit Signal Process 10(1):19–41CrossRefGoogle Scholar
  58. 58.
    Saeidi R, Lee KA, Kinnunen T, Hasan T, Fauve B, Bousquet PM, Khoury E, Sordo Martinez P, Kua JMK, You C et al (2013) I4u submission to nist sre 2012: a large-scale collaborative effort for noise-robust speaker verificationGoogle Scholar
  59. 59.
    Sivaram GS, Thomas S, Hermansky H (2011) Mixture of auto-associative neural networks for speaker verification. In: Cosi P, De Mori R, Di Fabbrizio G, Pieraccini R (eds) INTERSPEECH, pp 2381–2384Google Scholar
  60. 60.
    Song D, Heywood MI, Zincir-Heywood AN (2005) Training genetic programming on half a million patterns: an example from anomaly detection. Evolut Comput IEEE Trans 9(3):225–239CrossRefGoogle Scholar
  61. 61.
    Variani E, Lei X, McDermott E, Moreno IL, Gonzalez-Dominguez J (2014) Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4052–4056Google Scholar
  62. 62.
    Winkler SM, Affenzeller M, Wagner S (2007) Advanced genetic programming based machine learning. J Math Model Algorithms 6(3):455–480MathSciNetCrossRefzbMATHGoogle Scholar
  63. 63.
    Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Commun 66:130–153CrossRefGoogle Scholar
  64. 64.
    Wu Z, Kinnunen T, Evans N, Yamagishi J, Hanilçi C, Sahidullah M, Sizov A (2015) Asvspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. Training 10(15):3750Google Scholar
  65. 65.
    Xue B, Zhang M, Browne W, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(40):606–626CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Natural Computing Research and Applications Group (NCRA)University College DublinDublinIreland
  2. 2.Computer Science DepartmentUm Al-Qura UniversityMeccaSaudi Arabia

Personalised recommendations