Abstract
An approach combining a simple local representation method with a k-nearest neighbors-based direct voting scheme is proposed for speaker recognition. This approach rises computational problems that we effectively solved through an approximate fast k-nearest neighbors search technique. Experimental results with the EuTrans and SIVAspeech databases are reported showing the effectiveness of the proposed approach.
Work partially supported by the Spanish “Ministerio de Ciencia y Tecnología” under grants TIC2003-08496-C04-02 and DPI2001-0880-CO2-02.
The authors would like to thank the FUB – Fondazione Ugo Bordoni, for providing the SIVA corpus.
Chapter PDF
Similar content being viewed by others
References
Messer, K., Kittler, J., Sadeghi, M., Marcel, S., Marcel, C., Bengio, S., Cardinaux, F., Sanderson, C., Czyz, J., Vandendorpe, L., Srisuk, S., Petrou, M., Kurutach, W., Kadyrov, A., Paredes, R., Kepenekci, B., Tek, F.B., Akar, G.B., Deravi, F., Mavity, N.: Face Verification Competition on the XM2VTS Database. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, Springer, Heidelberg (2003)
Paredes, R., Perez-Cortes, J.C., Juan, A., Vidal, E.: Local Representations and a Direct Voting Scheme for Face Recognition. In: Workshop on Pattern Recognition in Information Systems, Setúbal, Portugal (July 2001)
Rabiner, L.R., Shafer, R.W.: Digital processing of speech signals. Prentice Hall, Englewood Cliffs (1978)
Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. JACM 45, 891–923 (1998)
Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE Trans. on PAMI 19(5), 530–535 (1997)
Mohr, R., Picard, S., Schmid, C.: Bayesian decision versus voting for image retrieval. In: Sommer, G., Daniilidis, K., Pauli, J. (eds.) CAIP 1997. LNCS, vol. 1296, Springer, Heidelberg (1997)
Shyu, C., et al.: Local versus Global Features for Content-Based Image Retrieval. In: Proc. of the IEEE Workshop on Content-Based Access of Image and Video Libraries, pp. 30–34 (1998)
Deriche, R., Giraudon, G.: A Computational Approach to Corner and Vertex Detection. Int. Journal of Computer Vision 10, 101–124 (1993)
Duin, R.P., Kittler, J., Hatef, M., Matas, J.: On combinig classifiers. IEEE Trasn. on PAMI (1998)
Liao, R., Li, S.Z.: Face Recognition Based on Multiple Facial Features. In: Proc. of the 4th IEEE Int. Conf. on Automatic Face and Gesture Recognition (2000)
Zhang, Z., Deriche, R., Faugeras, O., Luong, Q.: A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artificial Intelligence 78, 87–119 (1995)
Lhuillier, M., Quan, L.: Robust Dense Matching Using Local and Global Geometric Constraints. In: Proc. of ICPR 2000, vol. 1, pp. 968–972 (2000)
Messer, K., Matas, J., Kittler, J., Luettin, J., Maitre, G.: XM2VTSDB: The Extended M2VTS Database. In: Second International Conference on Audio and Video-based Biometric Person Authentication, March 1999, pp. 964–966 (1999)
Samaria, F., Harter, A.C.: Parameterisation of a Stochastic Model for Human Face Identification. In: Proc. of the 2nd IEEE Workshop on Applications of Computer Vision, pp. 138–142 (1994)
Ben-Arie, J., Nandy, D.: A volumetric/iconic frequency domain representation for objects with application for pose invariant face recognition. IEEE Trans. on PAMI 20, 449–457 (1998)
Aiello, D., Cerrato, L., Delogu, C., Di Carlo, A.: The acquisition of a speech corpus for limited domain translation. In: Proceedings of the European Conference on Speech Communication and Technology, Budapest (1999)
EuTrans. Example-based language translation systems. Final report. Technical report, Instituto Tecnológico de Informática, Fondazione Ugo Bordoni, Rheinisch Westfälische Technische Hochschule Aachen Lehrstuhl für Informatik VI, Zeres GmbH Bochum: Long Term Research Domain, Project Number 30268 (2000)
Falcone, M.: Gallo The SIVA speech database for speaker verification: description and evaluation. In: ICSLP 1996, Philadelphia, USA, October 1996, pp. 1902–1905 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paredes, R., Vidal, E., Casacuberta, F. (2004). Local Features for Speaker Recognition. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2004. Lecture Notes in Computer Science, vol 3138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27868-9_120
Download citation
DOI: https://doi.org/10.1007/978-3-540-27868-9_120
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22570-6
Online ISBN: 978-3-540-27868-9
eBook Packages: Springer Book Archive