Histogram Equalization in SVM Multimodal Person Verification

  • Mireia Farrús
  • Pascual Ejarque
  • Andrey Temko
  • Javier Hernando
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4642)


It has been shown that prosody helps to improve voice spectrum based speaker recognition systems. Therefore, prosodic features can also be used in multimodal person verification in order to achieve better results. In this paper, a multimodal recognition system based on facial and vocal tract spectral features is improved by adding prosodic information. Matcher weighting method and support vector machines have been used as fusion techniques, and histogram equalization has been applied before SVM fusion as a normalization technique. The results show that the performance of a SVM multimodal verification system can be improved by using histogram equalization, especially when the equalization is applied to those scores giving the highest EER values.


speaker recognition multimodality fusion support vector machines histogram equalization prosody voice spectrum face 


  1. 1.
    Bolle, R.M., et al.: Guide to Biometrics, p. 364. Springer, New York (2004)Google Scholar
  2. 2.
    Fox, N.A., et al.: Person identification using automatic integration of speech, lip and face experts. In: ACM SIGMM 2003 Multimedia Biometrics Methods and Applications Workshop, Berkeley, CA, ACM, New York (2003)Google Scholar
  3. 3.
    Indovina, M., et al.: Multimodal Biometric Authentication Methods: A COTS Approach. In: MMUA. Workshop on Multimodal User Authentication, Santa Barbara, CA (2003)Google Scholar
  4. 4.
    Lucey, S., Chen, T.: Improved audio-visual speaker recognition via the use of a hybrid combination strategy. In: The 4th International Conference on Audio- and Video- Based Biometric Person Authentication, Guildford, UK (2003)Google Scholar
  5. 5.
    Wang, Y., Tan, T.: Combining fingerprint and voiceprint biometrics for identity verification: and experimental comparison. In: Zhang, D., Jain, A.K. (eds.) ICBA 2004. LNCS, vol. 3072, Springer, Heidelberg (2004)Google Scholar
  6. 6.
    Farrús, M., et al.: On the Fusion of Prosody, Voice Spectrum and Face Features for Multimodal Person Verification. In: ICSLP, Pittsburgh (2006)Google Scholar
  7. 7.
    Campbell, J.P., Reynolds, D.A., Dunn, R.B.: Fusing high- and low-level features for speaker recognition. In: Eurospeech (2003)Google Scholar
  8. 8.
    Nadeu, C., Hernando, J., Gorricho, M.: On the decorrelation of filter bank energies in speech recognition. In: Eurospeech (1995)Google Scholar
  9. 9.
    Peskin, B., et al.: Using prosodic and conversational features for high-performance speaker recognition: Report from JHU WS’02. In: ICASSP (2003)Google Scholar
  10. 10.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems: Proceedings of the 2000 Conference, MIT Press, Cambridge (2001)Google Scholar
  11. 11.
    Zafeiriou, S., Tefas, A., Pitas, I.: Discriminant NMF-faces for frontal face verification. In: IEEE International Workshop on Machine Learning for Signal Processing, Mystic, Connecticut, IEEE, Los Alamitos (2005)Google Scholar
  12. 12.
    Hilger, F., Ney, H.: Quantile based histogram equalization for noise robust speech recognition. In: Eurospeech, Aalborg, Denmark (2001)Google Scholar
  13. 13.
    Balchandran, R., Mammone, R.: Non parametric estimation and correction of non linear distortion in speech systems. In: ICASSP (1998)Google Scholar
  14. 14.
    Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: ODYSSEY-2001 (2001)Google Scholar
  15. 15.
    Skosan, M., Mashao, D.: Modified Segmental Histogram Equalization for robust speaker verification. Pattern Recognition Letters 27(5), 479–486 (2006)CrossRefGoogle Scholar
  16. 16.
    Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines (and other kernel-based learning methods). Cambridge University Press, Cambridge (2000)Google Scholar
  17. 17.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge discovery 2, 121–167 (1998)CrossRefGoogle Scholar
  18. 18.
    Godfrey, J.J., Holliman, E.C., McDaniel, J.: Switchboard: Telephone speech corpus for research and development. In: ICASSP (1990)Google Scholar
  19. 19.
    Lüttin, J., Maître, G.: Evaluation Protocol for the Extended M2VTS Database (XM2VTSDB). In: IDIAP, Martigny, Switzerland (1998)Google Scholar
  20. 20.
    Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 711–720 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Mireia Farrús
    • 1
  • Pascual Ejarque
    • 1
  • Andrey Temko
    • 1
  • Javier Hernando
    • 1
  1. 1.TALP Research Center, Department of Signal Theory and Communications, Technical University of Catalonia, Barcelona, Catalonia 

Personalised recommendations