Combination of Features for Crosslingual Speaker Identification with the Constraint of Limited Data
Mel frequency cepstral coefficients (MFCC) has proven to be effective in speaker identification, but does not provide satisfactory performance in limited data condition. This paper presents a combination of features from different languages for Crosslingual speaker identification with the constraint of limited data. However, combined features can increase the complexity of the speaker identification system by doubling the dimensionality of the features. Frame reduction and smoothing are achieved using an adaptive weighted-sum algorithm. Experiment results show that the proposed method gives an average 11 % improved in performance over conventional MFCC method.
KeywordsFrame reduction Crosslingual MFCC
This work is supported by Visvesvraya Technological University (VTU), Belgaum-590018, Karnataka, India.
- 2.Halsband U (2006) Bilingual and multilingual language processing. J. Physiol Paris 99:355–369Google Scholar
- 3.Arjun PH (2005) Speaker recognition in Indian languages: A feature based approach. Ph.D. dissertation, Indian Institute of Technology, KharagpurGoogle Scholar
- 4.Nagaraja BG, Jayanna HS (2012) Mono and cross lingual speaker identification with the constraint of limited data. In: Proceedings of IEEE, PRIME-2012, Periyar University, Salem, pp 439–443Google Scholar
- 5.Durou G (1999) Multilingual text-independent speaker identification. In: Proceedings of MIST 1999 workshop, Leusden, pp 115–118Google Scholar
- 6.Arjun PH, Sitaram S, Sharma E (2009) DA-IICT cross-lingual and multilingual corpora for speaker recognition. In: Proceedings of IEEE advances in pattern recognition, Kolkata, pp 187–190Google Scholar
- 7.Jayanna HS (2009) Limited data speaker recognition. Ph.D. dissertation, Indian Institute of Technology, GuwahatiGoogle Scholar
- 9.Nuratch S, Boonpramuk P, Wutiwiwatchai C (2010) Feature smoothing and frame reduction for speaker recognition. In: Proceedings of IEEE international conference on Asian language processing, pp 311–314Google Scholar