Speaker Identification in a Multi-speaker Environment
Human beings are capable of performing unfathomable tasks. A human being is able to focus on a single person’s voice in an environment of simultaneous conversations. We have tried to emulate this particular skill through an artificial intelligence system. Our system identifies an audio file as a single or multi-speaker file as the first step and then recognizes the speaker(s). Our approach towards the desired solution was to first conduct pre-processing of the audio (input) file where it is subjected to reduction and silence removal, framing, windowing and DCT calculation, all of which is used to extract its features. Mel Frequency Cepstral Coefficients (MFCC) technique was used for feature extraction. The extracted features are then used to train the system via neural networks using the Error Back Propagation Training Algorithm (EBPTA). One of the many applications of our model is in biometric systems such as telephone banking, authentication and surveillance.
KeywordsSpeaker identification Neural network Multi-speaker Mel frequency cepstral coefficients (MFCC)
Our special thanks to Mr. Arun Kulkarni, our Head of Department (Information Technology) for his cooperation and unconditional support. As our teacher he provided us with his useful insights and extended a helping hand whenever it was required.
We are highly indebted to the faculty of Thadomal Shahani Engineering College for their guidance and constant supervision as well as for providing necessary information regarding the project & also for their support in making the project.
- 1.Barry Arons, “A Review of The Cocktail Party Effect”, MIT Media Lab.Google Scholar
- 2.M.K. Alisdairi,”Speaker Isolation in a “Cocktail-Party” Setting”, Term Project Report, Columbia University, 2002.Google Scholar
- 3.Wei-Ho Tsai and Shih-Jie Liao, “Speaker Identification in Overlapping Speech”, Journal Of Information Science and Engineering paper published in 2010.Google Scholar
- 4.Amit Sahoo and Ashish Panda, “Study of Speaker Recognition Systems”, National Institute of Technology, Rourkela, 2011.Google Scholar
- 5.Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi,“Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques”, Volume 2, Issue 3, Journal of computing, March2010.Google Scholar
- 6.Noor Khaled, Saad Najiam Al Saad, “Neural Network Based Speaker Identification System Using Features Selection”, Department of Computer Science, Al-Mustansiriyah University, Baghdad, Iraqi.Google Scholar
- 7.PPS Subhashini, Dr. M. Satya Sairam, Dr. D Srinivasarao, “Speaker Identification with Back Propagation Neural Network Algorithm”, International Journal of Engineering Trends and Technology paper published in 2014.Google Scholar
- 8.Douglas A. Reynolds, “Automatic Speaker Recognition Using Gaussian Mixture Speaker Models”, Volume B, Number 2, The Lincoln Laboratory Journal, 1995.Google Scholar
- 9.Ms. Asharani V R, Mrs. Anitha G, Dr. Mohamed Rafi, “Speakers Determination and Isolation from Multispeaker Speech Signal”, Volume 4 Issue 4, International Journal of Computer Science and Mobile Computing, April 2015.Google Scholar
- 10.Changlong Li, Xuehai Zhou, “Implementation of Artificial Neural Networks in MapReduce Optimization “University of Science and Technology of China, 2 Texas Tech University, US, 2014.Google Scholar